**Jean Goubault-Larrecq Barbara König (Eds.)**

# **Foundations of Software Science and Computation Structures**

**23rd International Conference, FOSSACS 2020 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020 Dublin, Ireland, April 25–30, 2020, Proceedings**

## Lecture Notes in Computer Science 12077

Founding Editors

Gerhard Goos, Germany Juris Hartmanis, USA

## Editorial Board Members

Elisa Bertino, USA Wen Gao, China Bernhard Steffen , Germany Gerhard Woeginger , Germany Moti Yung, USA

## Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen , University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at http://www.springer.com/series/7407

# Foundations of Software Science and Computation Structures

23rd International Conference, FOSSACS 2020 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020 Dublin, Ireland, April 25–30, 2020 Proceedings

Editors Jean Goubault-Larrecq Université Paris-Saclay, ENS Paris-Saclay, CNRS Cachan, France

Barbara König University of Duisburg-Essen Duisburg, Germany

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-45230-8 ISBN 978-3-030-45231-5 (eBook) https://doi.org/10.1007/978-3-030-45231-5

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2020. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## ETAPS Foreword

Welcome to the 23rd ETAPS! This is the first time that ETAPS took place in Ireland in its beautiful capital Dublin.

ETAPS 2020 was the 23rd instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations of programming language developments, analysis tools, and formal approaches to software engineering. Organizing these conferences in a coherent, highly synchronized conference program enables researchers to participate in an exciting event, having the possibility to meet many colleagues working in different directions in the field, and to easily attend talks of different conferences. On the weekend before the main conference, numerous satellite workshops took place that attracted many researchers from all over the globe. Also, for the second time, an ETAPS Mentoring Workshop was organized. This workshop is intended to help students early in the program with advice on research, career, and life in the fields of computing that are covered by the ETAPS conference.

ETAPS 2020 received 424 submissions in total, 129 of which were accepted, yielding an overall acceptance rate of 30.4%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2020 featured the unifying invited speakers Scott Smolka (Stony Brook University) and Jane Hillston (University of Edinburgh) and the conference-specific invited speakers (ESOP) Işıl Dillig (University of Texas at Austin) and (FASE) Willem Visser (Stellenbosch University). Invited tutorials were provided by Erika Ábrahám (RWTH Aachen University) on the analysis of hybrid systems and Madhusudan Parthasarathy (University of Illinois at Urbana-Champaign) on combining Machine Learning and Formal Methods. On behalf of the ETAPS 2020 attendants, I thank all the speakers for their inspiring and interesting talks!

ETAPS 2020 took place in Dublin, Ireland, and was organized by the University of Limerick and Lero. ETAPS 2020 is further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Tiziana Margaria (general chair, UL and Lero), Vasileios Koutavas (Lero@UCD), Anila Mjeda (Lero@UL), Anthony Ventresque (Lero@UCD), and Petros Stratis (Easy Conferences).

The ETAPS Steering Committee (SC) consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saarbrücken), Marieke Huisman (chair, Twente), Joost-Pieter Katoen (Aachen and Twente), Jan Kofron (Prague), Gerald Lüttgen (Bamberg), Tarmo Uustalu (Reykjavik and Tallinn), Caterina Urban (Inria, Paris), and Lenore Zuck (Chicago).

Other members of the SC are: Armin Biere (Linz), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), Jan-Friso Groote (Eindhoven), Esther Guerra (Madrid), Jurriaan Hage (Utrecht), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki), Stefan Kiefer (Oxford), Barbara König (Duisburg), Fabrice Kordon (Paris), Jan Kretinsky (Munich), Kim G. Larsen (Aalborg), Tiziana Margaria (Limerick), Peter Müller (Zurich), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Peter Ryan (Luxembourg), Don Sannella (Edinburgh), Bernhard Steffen (Dortmund), Mariëlle Stoelinga (Twente), Gabriele Taentzer (Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague), Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Nobuko Yoshida (London).

I would like to take this opportunity to thank all speakers, attendants, organizers of the satellite workshops, and Springer for their support. I hope you all enjoyed ETAPS 2020. Finally, a big thanks to Tiziana and her local organization team for all their enormous efforts enabling a fantastic ETAPS in Dublin!

February 2020 Marieke Huisman ETAPS SC Chair ETAPS e.V. President

## Preface

This volume contains the papers presented at the 23rd International Conference on Foundations of Software Science and Computation Structures (FoSSaCS), which took place in Dublin, Ireland, during April 27–30, 2020. The conference series is dedicated to foundational research with a clear significance for software science. It brings together research on theories and methods to support the analysis, integration, synthesis, transformation, and verification of programs and software systems.

This volume contains 31 contributed papers selected from 98 full paper submissions, and also a paper accompanying an invited talk by Scott Smolka (Stony Brook University, USA). Each submission was reviewed by at least three Program Committee members, with the help of external reviewers, and the final decisions took into account the feedback from a rebuttal phase. The conference submissions were managed using the EasyChair conference system, which was also used to assist with the compilation of these proceedings.

We wish to thank all the authors who submitted papers to FoSSaCS 2020, the Program Committee members, the Steering Committee members and the external reviewers. In addition, we are grateful to the ETAPS 2020 Organization for providing an excellent environment for FoSSaCS 2020 alongside the other ETAPS conferences and workshops.

February 2020 Jean Goubault-Larrecq Barbara König

## Organization

## Program Committee

Nick Benton Facebook, UK Frédéric Blanqui Inria and LSV, France Shankara Narayanan Krishna

## Steering Committee

Parosh Aziz Abdulla Uppsala University, Sweden Thorsten Altenkirch University of Nottingham, UK Paolo Baldan Università di Padova, Italy Michele Boreale Università di Firenze, Italy Corina Cirstea University of Southampton, UK Pedro R. D'Argenio Universidad Nacional de Córdoba, CONICET, Argentina Josée Desharnais Université Laval, Canada Jean Goubault-Larrecq Université Paris-Saclay, ENS Paris-Saclay, CNRS, LSV, Cachan, France Ichiro Hasuo National Institute of Informatics, Japan Delia Kesner IRIF, Université de Paris, France IIT Bombay, India

Barbara König Universität Duisburg-Essen, Germany Sławomir Lasota University of Warsaw, Poland Xavier Leroy Collège de France and Inria, France Leonid Libkin University of Edinburgh, UK, and ENS Paris, France Jean-Yves Marion LORIA, Université de Lorraine, France Dominique Méry LORIA, Université de Lorraine, France Matteo Mio LIP, CNRS, ENS Lyon, France Andrzej Murawski University of Oxford, UK Prakash Panangaden McGill University, Canada Amr Sabry Indiana University Bloomington, USA Lutz Schröder Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany Sebastian Siebertz Universität Bremen, Germany Benoît Valiron LRI, CentraleSupélec, Université Paris-Saclay, France

Andrew Pitts (Chair) University of Cambridge, UK Christel Baier Technische Universität Dresden, Germany Lars Birkedal Aarhus University, Denmark Ugo Dal Lago Università degli Studi di Bologna, Italy

Javier Esparza Technische Universität München, Germany Anca Muscholl LaBRI, Université de Bordeaux, France Frank Pfenning Carnegie Mellon University, USA

## Additional Reviewers

Accattoli, Beniamino Alvim, Mario S. André, Étienne Argyros, George Arun-Kumar, S. Ayala-Rincon, Mauricio Bacci, Giorgio Bacci, Giovanni Balabonski, Thibaut Basile, Davide Berger, Martin Bernardi, Giovanni Bisping, Benjamin Bodeveix, Jean-Paul Bollig, Benedikt Bonchi, Filippo Bonelli, Eduardo Boulmé, Sylvain Bourke, Timothy Bradfield, Julian Breuvart, Flavien Bruni, Roberto Bruse, Florian Capriotti, Paolo Carette, Jacques Carette, Titouan Carton, Olivier Cassano, Valentin Chadha, Rohit Charguéraud, Arthur Cho, Kenta Choudhury, Vikraman Ciancia, Vincenzo Clemente, Lorenzo Colacito, Almudena Corradini, Andrea Czerwiński, Wojciech de Haan, Ronald de Visme, Marc

Dell'Erba, Daniele Deng, Yuxin Eickmeyer, Kord Exibard, Leo Faggian, Claudia Fijalkow, Nathanaël Filali-Amine, Mamoun Francalanza, Adrian Frutos Escrig, David Galletta, Letterio Ganian, Robert Garrigue, Jacques Gastin, Paul Genaim, Samir Genest, Blaise Ghica, Dan Goncharov, Sergey Gorla, Daniele Guerrini, Stefano Hirschowitz, Tom Hofman, Piotr Hoshino, Naohiko Howar, Falk Inverso, Omar Iván, Szabolcs Jaax, Stefan Jeandel, Emmanuel Johnson, Michael Kahrs, Stefan Kamburjan, Eduard Katsumata, Shin-Ya Kerjean, Marie Kiefer, Stefan Komorida, Yuichi Kop, Cynthia Kremer, Steve Kuperberg, Denis Křetínský, Jan Laarman, Alfons

Laurent, Fribourg Levy, Paul Blain Li, Yong Licata, Daniel R. Liquori, Luigi Lluch Lafuente, Alberto Lopez, Aliaume Malherbe, Octavio Manuel, Amaldev Manzonetto, Giulio Matache, Christina Matthes, Ralph Mayr, Richard Melliès, Paul-André Merz, Stephan Miculan, Marino Mikulski, Łukasz Moser, Georg Moss, Larry Munch-Maccagnoni, Guillaume Muskalla, Sebastian Nantes-Sobrinho, Daniele Nestra, Härmel Neumann, Eike Neves, Renato Niehren, Joachim Padovani, Luca Pagani, Michele Paquet, Hugo Patterson, Daniel Pedersen, Mathias Ruggaard Peressotti, Marco Pitts, Andrew Potapov, Igor Power, John Praveen, M. Puppis, Gabriele Péchoux, Romain Pérez, Guillermo Quatmann, Tim Rabinovich, Roman Radanne, Gabriel Rand, Robert Ravara, António Remy, Didier

Reutter, Juan L. Rossman, Benjamin Rot, Jurriaan Rowe, Reuben Ruemmer, Philipp Sammartino, Matteo Sankaran, Abhisekh Sankur, Ocan Sattler, Christian Schmitz, Sylvain Serre, Olivier Shirmohammadi, Mahsa Siles, Vincent Simon, Bertrand Simpson, Alex Singh, Neeraj Sprunger, David Srivathsan, B. Staton, Sam Stolze, Claude Straßburger, Lutz Streicher, Thomas Tan, Tony Tawbi, Nadia Toruńczyk, Szymon Tzevelekos, Nikos Urbat, Henning van Bakel, Steffen van Breugel, Franck van de Pol, Jaco van Doorn, Floris Van Raamsdonk, Femke Vaux Auclair, Lionel Verma, Rakesh M. Vial, Pierre Vignudelli, Valeria Vrgoc, Domagoj Waga, Masaki Wang, Meng Witkowski, Piotr Zamdzhiev, Vladimir Zemmari, Akka Zhang, Zhenya Zorzi, Margherita

## Contents



Contents xv


## **Neural Flocking: MPC-based Supervised Learning of Flocking Controllers**

(-)Usama Mehmood1, Shouvik Roy1, Radu Grosu2, Scott A. Smolka1, Scott D. Stoller1, and Ashish Tiwari<sup>3</sup>

> <sup>1</sup> Stony Brook University, Stony Brook NY, USA umehmood@cs.stonybrook.edu

<sup>2</sup> Technische Universitat Wien, Wien, Austria

<sup>3</sup> Microsoft Research, San Francisco CA, USA

**Abstract.** We show how a symmetric and fully distributed flocking controller can be synthesized using Deep Learning from a centralized flocking controller. Our approach is based on Supervised Learning, with the centralized controller providing the training data, in the form of trajectories of state-action pairs. We use Model Predictive Control (MPC) for the centralized controller, an approach that we have successfully demonstrated on flocking problems. MPC-based flocking controllers are high-performing but also computationally expensive. By learning a symmetric and distributed neural flocking controller from a centralized MPC-based one, we achieve the best of both worlds: the neural controllers have high performance (on par with the MPC controllers) and high efficiency. Our experimental results demonstrate the sophisticated nature of the distributed controllers we learn. In particular, the neural controllers are capable of achieving myriad flocking-oriented control objectives, including flocking formation, collision avoidance, obstacle avoidance, predator avoidance, and target seeking. Moreover, they generalize the behavior seen in the training data to achieve these objectives in a significantly broader range of scenarios. In terms of verification of our neural flocking controller, we use a form of statistical model checking to compute confidence intervals for its convergence rate and time to convergence.

**Keywords:** Flocking · Model Predictive Control · Distributed Neural Controller · Deep Neural Network · Supervised Learning

## **1 Introduction**

With the introduction of Reynolds rule-based model [16, 17], it is now possible to understand the flocking problem as one of distributed control. Specifically, in this model, at each time-step, each agent executes a control law given in terms of the weighted sum of three competing forces to determine its next acceleration. Each of these forces has its own rule: *separation* (keep a safe distance away from your neighbors), *cohesion* (move towards the centroid of your neighbors), and *alignment* (steer toward the average heading of your neighbors). Reynolds

Fig. 1: Neural Flocking Architecture

controller is *distributed*; i.e., it is executed separately by each agent, using information about only itself and nearby agents, and without communication. Furthermore, it is *symmetric*; i.e., every agent runs the same controller (same code).

We subsequently showed that a simpler, more declarative approach to the flocking problem is possible [11]. In this setting, flocking is achieved when the agents combine to minimize a system-wide *cost function*. We presented centralized and distributed solutions for achieving this form of "declarative flocking" (DF), both of which were formulated in terms of Model-Predictive Control (MPC) [2].

Another advantage of DF over the ruled-based approach exemplified by Reynolds model is that it allows one to consider additional control objectives (e.g., obstacle and predator avoidance) simply by extending the cost function with additional terms for these objectives. Moreover, these additional terms are typically quite straightforward in nature. In contrast, deriving behavioral rules that achieve the new control objectives can be a much more challenging task.

An issue with MPC is that computing the next control action can be computationally expensive, as MPC searches for an action sequence that minimizes the cost function over a given prediction horizon. This renders MPC unsuitable for real-time applications with short control periods, for which flocking is a prime example. Another potential problem with MPC-based approaches to flocking is its performance (in terms of achieving the desired flight formation), which may suffer in a fully distributed setting.

In this paper, we present *Neural Flocking* (NF), a new approach to the flocking problem that uses Supervised Learning to learn a symmetric and fully distributed flocking controller from a centralized MPC-based controller. By doing so, we achieve the best of both worlds: high performance (on par with the MPC controllers) in terms of meeting flocking flight-formation objectives, and high efficiency leading to real-time flight controllers. Moreover, our NF controllers can easily be parallelized on hardware accelerators such as GPUs and TPUs.

Figure 1 gives an overview of the NF approach. A high-performing centralized MPC controller provides the labeled training data to the learning agent: a symmetric and distributed neural controller in the form of a deep neural network (DNN). The training data consists of trajectories of state-action pairs, where a state contains the information known to an agent at a time step (e.g., its own position and velocity, and the position and velocity of its neighbors), and the action (the label) is the acceleration assigned to that agent at that time step by the centralized MPC controller.

We formulate and evaluate NF in a number of essential flocking scenarios: basic flocking with inter-agent collision avoidance, as in [11], and more advanced

scenarios with additional objectives, including obstacle avoidance, predator avoidance, and target seeking by the flock. We conduct an extensive performance evaluation of NF. Our experimental results demonstrate the sophisticated nature of NF controllers. In particular, they are capable of achieving all of the stated control objectives. Moreover, they generalize the behavior seen in the training data in order to achieve these objectives in a significantly broader range of scenarios. In terms of verification of our neural controller, we use a form of statistical model checking [5, 10] to compute confidence intervals for its rate of convergence to a flock and for its time to convergence.

## **2 Background**

We consider a set of n dynamic agents A = {1,...,n} that move according to the following discrete-time equations of motion:

$$\begin{aligned} p\_i(k+1) &= p\_i(k) + dt \cdot v\_i(k), \quad |v\_i(k)| < \bar{v} \\ v\_i(k+1) &= v\_i(k) + dt \cdot a\_i(k), \quad |a\_i(k)| < \bar{a} \end{aligned} \tag{1}$$

where <sup>p</sup>i(k) <sup>∈</sup> <sup>R</sup>2, <sup>v</sup>i(k) <sup>∈</sup> <sup>R</sup>2, <sup>a</sup>i(k) <sup>∈</sup> <sup>R</sup><sup>2</sup> are the position, velocity and acceleration of agent <sup>i</sup> ∈ A respectively at time step <sup>k</sup>, and dt <sup>∈</sup> <sup>R</sup><sup>+</sup> is the time step. The magnitudes of velocities and accelerations are bounded by v¯ and a¯, respectively. Acceleration ai(k) is the control input for agent i at time step k. The acceleration is updated after every η time steps i.e., η · dt is the control period. The flock *configuration* at time step k is thus given by the following vectors (in boldface):

$$\mathbf{p}(k) = \left[ p\_1^T(k) \cdot \cdots \cdot p\_n^T(k) \right]^T \tag{2}$$

$$\mathbf{v}(k) = [v\_1^T(k) \cdot \cdots \cdot v\_n^T(k)]^T \tag{3}$$

$$\mathbf{a}(k) = [a\_1^T(k) \cdot \cdots \cdot a\_n^T(k)]^T \tag{4}$$

The configuration vectors are referred to without the time indexing as **p**, **v**, and **a**. The *neighborhood* of agent i at time step k, denoted by N<sup>i</sup>(k) ⊆ A, contains its N -nearest neighbors, i.e., the N other agents closest to it. We use this definition (in Section 2.2 to define a distributed-flocking cost function) for simplicity, and expect that a radius-based definition of neighborhood would lead to similar results for our distributed flocking controllers.

#### **2.1 Model-Predictive Control**

Model-Predictive control (MPC) [2] is a well-known control technique that has recently been applied to the flocking problem [11, 19, 20]. At each control step, an optimization problem is solved to find the optimal sequence of control actions (agent accelerations in our case) that minimizes a given cost function with respect to a predictive model of the system. The first control action of the optimal control sequence is then applied to the system; the rest is discarded. In the computation

of the cost function, the predictive model is evaluated for a finite prediction horizon of T control steps.

MPC-based flocking models can be categorized as *centralized* or *distributed*. A *centralized* model assumes that complete information about the flock is available to a single "global" controller, which uses the states of all agents to compute their next optimal accelerations. The following optimization problem is solved by a centralized MPC controller at each control step k:

$$\min\_{\mathbf{a}(k|k),...,\mathbf{a}(k+T-1|k)<\bar{a}} J(k) + \lambda \cdot \sum\_{t=0}^{T-1} \|\mathbf{a}(k+t\mid k)\|^2 \tag{5}$$

The first term J(k) is the centralized model-specific cost, evaluated for T control steps (this embodies the predictive aspect of MPC), starting at time step k. It encodes the control objective of minimizing the cost function J(k). The second term, scaled by a weight λ > 0, penalizes large control inputs: **a**(k + t | k) are the predictions made at time step k for the accelerations at time step k + t.

In *distributed MPC*, each agent computes its acceleration based only on its own state and its local knowledge, e.g., information about its neighbors:

$$\min\_{a\_i(k|k),...,a\_i(k+T-1|k)<\bar{a}} J\_i(k) + \lambda \cdot \sum\_{t=0}^{T-1} ||a\_i(k+t \mid k)||^2 \tag{6}$$

Ji(k) is the distributed, model-specific cost function for agent i, analogous to J(k). In a distributed setting where an agent's knowledge of its neighbors' behavior is limited, an agent cannot calculate the exact future behavior of its neighbors. Hence, the predictive aspect of Ji(k) must rely on some assumption about that behavior during the prediction horizon. Our distributed cost functions are based on the assumption that the neighbors have zero accelerations during the prediction horizon. While this simple design is clearly not completely accurate, our experiments show that it still achieves good results.

#### **2.2 Declarative Flocking**

Declarative flocking (DF) is a high-level approach to designing flocking algorithms based on defining a suitable cost function for MPC [11]. This is in contrast to the operational approach, where a set of rules are used to capture flocking behavior, as in Reynolds model. For basic flocking, the DF cost function contains two terms: (1) a *cohesion* term based on the squared distance between each pair of agents in the flock; and (2) a *separation* term based on the inverse of the squared distance between each pair of agents. The flock evolves toward a configuration in which these two opposing forces are balanced. The cost function J <sup>C</sup> for centralized DF, i.e., centralized MPC (CMPC), is as follows:

$$J^C\left(\mathbf{p}\right) = \frac{2}{|\mathcal{A}| \cdot (|\mathcal{A}| - 1)} \cdot \sum\_{i \in \mathcal{A}} \sum\_{j \in \mathcal{A}, i < j} ||p\_{ij}||^2 + \omega\_s \cdot \frac{1}{||p\_{ij}||^2} \tag{7}$$

where ωs is the weight of the separation term and controls the density of the flock. The cost function is normalized by the number of pairs of agents, |A|·(|A−1|) <sup>2</sup> ; as such, the cost does not depend on the size of the flock. The control law for CMPC is given by Eq. (5), with J(k) = <sup>T</sup> <sup>t</sup>=1 <sup>J</sup> <sup>C</sup> (**p**(<sup>k</sup> <sup>+</sup> <sup>t</sup> <sup>|</sup> <sup>k</sup>)).

The basic flocking cost function for distributed DF is similar to that for CMPC, except that the cost function J<sup>D</sup> <sup>i</sup> for agent i is computed over its set of neighbors Ni(k) at time k:

$$J\_i^{\rm D}(\mathbf{p}(k)) = \frac{1}{|\mathcal{N}\_i(k)|} \cdot \sum\_{j \in \mathcal{N}\_i(k)} ||p\_{ij}||^2 + \omega\_s \cdot \sum\_{j \in \mathcal{N}\_i(k)} \frac{1}{||p\_{ij}||^2} \tag{8}$$

The control law for agent i is given by Eq. (6), with Ji(k) = <sup>T</sup> <sup>t</sup>=1 J<sup>D</sup> <sup>i</sup> (**p**(k + t | k)).

## **3 Additional Control Objectives**

The cost functions for basic flocking given in Eqs. (7) and (8) are designed to ensure that in the steady state, the agents are well-separated. Additional goals such as obstacle avoidance, predator avoidance, and target seeking are added to the MPC formulation as weighted cost-function terms. Different objectives can be combined by including the corresponding terms in the cost function as a weighted sum.

*Cost-Function Term for Obstacle Avoidance.* We consider multiple rectangular obstacles which are distributed randomly in the field. For a set of m rectangular obstacles O = {O1, O2, ..., O<sup>m</sup>}, we define the cost function term for obstacle avoidance as:

$$J\_{OA}(\mathbf{p}, \mathbf{o}) = \frac{1}{|\mathcal{A}||\mathcal{O}|} \sum\_{i \epsilon \mathcal{A}} \sum\_{j \epsilon \mathcal{O}} \frac{1}{\left\| p\_i - o\_j^{(i)} \right\|^2} \tag{9}$$

where **o** is the set of points on the obstacle boundaries and o (i) <sup>j</sup> is the point on the obstacle boundary of the <sup>j</sup>th obstacle <sup>O</sup><sup>j</sup> that is closest to the <sup>i</sup> th agent.

*Cost-Function Term for Target Seeking.* This term is the average of the squared distance between the agents and the target. Let g denote the position of the fixed target. Then the target-seeking term is as defined as

$$J\_{TS}(\mathbf{p}) = \frac{1}{|\mathcal{A}|} \sum\_{i \in \mathcal{A}} \|p\_i - g\|^2 \tag{10}$$

*Cost-Function Term for Predator Avoidance.* We introduce a single predator, which is more agile than the flocking agents: its maximum speed and acceleration are a factor of f<sup>p</sup> greater than v¯ and a¯, respectively, with f<sup>p</sup> > 1. Apart from being more agile, the predator has the same dynamics as the agents, given by

Eq. (1). The control law for the predator consists of a single term that causes it to move toward the centroid of the flock with maximum acceleration.

For a flock of n agents and one predator, the cost-function term for predator avoidance is the average of the inverse of the cube of the distances between the predator and the agents. It is given by:

$$J\_{PA} \left( \mathbf{p}, p\_{pred} \right) = \frac{1}{|\mathcal{A}|} \sum\_{i \in \mathcal{A}} \frac{1}{||p\_i - p\_{pred}||^3} \tag{11}$$

where ppred is the position of the predator. In contrast to the separation term in Eqs. (5)-(6), which we designed to ensure inter-agent collision avoidance, the predator-avoidance term has a cube instead of a square in the denominator. This is to reduce the influence of the predator on the flock when the predator is far away from the flock.

*NF Cost-Function Terms.* The MPC cost functions used in our examination of Neural Flocking are weighted sums of the cost function terms introduced above. We refer to the first term of our centralized DF cost function J <sup>C</sup> (**p**) (see Eq. (7)) as Jcohes(**p**) and the second as Jsep(**p**). We use the following cost functions J1, J2, and J<sup>3</sup> for basic flocking with collision avoidance, obstacle avoidance with target seeking, and predator avoidance, respectively.

$$J\_1(\mathbf{p}) = J\_{ches}(\mathbf{p}) + \omega\_s \cdot J\_{sep}(\mathbf{p}) \tag{12a}$$

$$J\_2(\mathbf{p}, \mathbf{o}) = J\_{cohes}(\mathbf{p}) + \omega\_s \cdot J\_{sep}(\mathbf{p}) + \omega\_o \cdot J\_{OA}(\mathbf{p}, \mathbf{o}) + \omega\_t \cdot J\_{TS}(\mathbf{p}) \tag{12b}$$

$$J\_3(\mathbf{p}, p\_{pred}) = J\_{cohes}(\mathbf{p}) + \omega\_s \cdot J\_{sep}(\mathbf{p}) + \omega\_p \cdot J\_{PA}(\mathbf{p}, p\_{pred}) \tag{12c}$$

where ω<sup>s</sup> is the weight of the separation term, ω<sup>o</sup> is the weight of the obstacle avoidance term, ω<sup>t</sup> is the weight of the target-seeking term, and ω<sup>p</sup> is the weight of the predator-avoidance term. Note that J<sup>1</sup> is equivalent to J <sup>C</sup> (Eq. (7)). The weight ω<sup>s</sup> of the separation term is experimentally chosen to ensure that the distance between agents, throughout the simulation, is at least dmin, the minimum inter-agent distance representing collision avoidance. Similar considerations were given to the choice of values for ω<sup>o</sup> and ωp. The specific values we used for the weights are: ω<sup>s</sup> = 2000, ω<sup>o</sup> = 1500, ω<sup>t</sup> = 10, and ω<sup>p</sup> = 500.

We experimented with an alternative strategy for introducing inter-agent collision avoidance, obstacle avoidance, and predator avoidance into the MPC problem, namely, as *constraints* of the form dmin − pij < 0, dmin − ||p<sup>i</sup> − o (i) <sup>j</sup> || < 0, and dmin − ||p<sup>i</sup> − ppred|| < 0, respectively. Using the theory of exact penalty functions [12], we recast the constrained MPC problem as an equivalent unconstrained MPC problem by converting the constraints into a weighted *penalty term*, which is then added to the MPC cost function. This approach rendered the optimization problem difficult to solve due to the non-smoothness of the penalty term. As a result, constraint violations in the form of collisions were observed during simulation.

## **4 Neural Flocking**

We learn a *distributed neural controller* (DNC) for the flocking problem using training data in the form of trajectories of state-action pairs produced by a CMPC controller. In addition to basic flocking with inter-agent collision avoidance, the DNC exhibits a number of other flocking-related behaviors, including obstacle avoidance, target seeking, and predator avoidance. We also show how the learned behavior exhibited by the DNC generalizes over a larger number of agents than what was used during training to achieve successful collision-free flocking in significantly larger flocks.

We use *Supervised Learning* to train the DNC. Supervised Learning learns a function that maps an input to an output based on example sequences of inputoutput pairs. In our case, the trajectory data obtained from CMPC contains both the training inputs and corresponding labels (outputs): the state of an agent in the flock (and that of its nearest neighbors) at a particular time step is the input, and that agent's acceleration at the same time step is the label.

#### **4.1 Training Distributed Flocking Controllers**

We use Deep Learning to synthesize a distributed and symmetric neural controller from the training data provided by the CMPC controller. Our objective is to learn basic flocking, obstacle avoidance with target seeking, and predator avoidance. Their respective CMPC-based cost functions are given in Sections 2.2 and 3. All of these control objectives implicitly also include inter-agent collision avoidance by virtue of the separation term in Eq. 7.

For each of these control objectives, DNC training data is obtained from CMPC trajectory data generated for n = 15 agents, starting from initial configurations in which agent positions and velocities are uniformly sampled from [−15, 15]<sup>2</sup> and [0, 1]<sup>2</sup>, respectively. All training trajectories are 1,000 time steps in duration.

We further ensure that the initial configurations are *recoverable*; i.e., no two agents are so close to each other that they cannot avoid a collision by resorting to maximal accelerations. We learn a single DNC from the state-action pairs of all n agents. This yields a symmetric distributed controller, which we use for each agent in the flock during evaluation.

*Basic Flocking.* Trajectory data for basic flocking is generated using the cost function given in Eq. (7). We generate 200 trajectories, each of which (as noted above) is 1,000 time steps long. The input to the NN is the position and velocity of each agent along with the positions and velocities of its N -nearest neighbors. This yields 200 · 1, 000 · 15 = 3M total training samples.

Let us refer to the agent (the DNC) being learned as A0. Since we use neighborhood size <sup>N</sup> = 14, the input to the NN is of the form [p<sup>x</sup> <sup>0</sup> <sup>p</sup><sup>y</sup> <sup>0</sup> <sup>v</sup><sup>x</sup> <sup>0</sup> <sup>v</sup><sup>y</sup> <sup>0</sup> <sup>p</sup><sup>x</sup> <sup>1</sup> <sup>p</sup><sup>y</sup> 1 vx <sup>1</sup> <sup>v</sup><sup>y</sup> <sup>1</sup> ... p<sup>x</sup> <sup>14</sup> <sup>p</sup><sup>y</sup> <sup>14</sup> <sup>v</sup><sup>x</sup> <sup>14</sup> <sup>v</sup><sup>y</sup> <sup>14</sup>], where <sup>p</sup><sup>x</sup> <sup>0</sup> , <sup>p</sup><sup>y</sup> <sup>0</sup> are the position coordinates and <sup>v</sup><sup>x</sup> <sup>0</sup> , <sup>v</sup><sup>y</sup> 0 velocity coordinates for agent <sup>A</sup>0, and <sup>p</sup><sup>x</sup> <sup>1</sup>...14, <sup>p</sup><sup>y</sup> <sup>1</sup>...<sup>14</sup> and <sup>v</sup><sup>x</sup> <sup>1</sup>...14, <sup>v</sup><sup>y</sup> <sup>1</sup>...<sup>14</sup> are the position and velocity vectors of its neighbors. Since this input vector has 60 components, the input to the NN consists of 60 features.

Fig. 2: Snapshots of DNC flocking behaviors for 30 agents

*Obstacle Avoidance with Target Seeking.* For obstacle avoidance with target seeking, we use CMPC with the cost function given in Eq. (12b). The target is located beyond the obstacles, forcing the agents to move through the obstacle field. For the training data, we generate 100 trajectories over 4 different obstacle fields (25 trajectories per obstacle field). The input to the NN consists of the 92 features [p<sup>x</sup> <sup>0</sup> <sup>p</sup><sup>y</sup> <sup>0</sup> <sup>v</sup><sup>x</sup> <sup>0</sup> <sup>v</sup><sup>y</sup> <sup>0</sup> <sup>o</sup><sup>x</sup> <sup>0</sup> <sup>o</sup><sup>y</sup> <sup>0</sup> ... p<sup>x</sup> <sup>14</sup> <sup>p</sup><sup>y</sup> <sup>14</sup> <sup>v</sup><sup>x</sup> <sup>14</sup> <sup>v</sup><sup>y</sup> <sup>14</sup> <sup>o</sup><sup>x</sup> <sup>14</sup> <sup>o</sup><sup>y</sup> <sup>14</sup> <sup>g</sup><sup>x</sup> <sup>g</sup><sup>y</sup>], where <sup>o</sup><sup>x</sup> <sup>0</sup> , <sup>o</sup><sup>y</sup> <sup>0</sup> is the closest point on any obstacle to agent <sup>A</sup>0; <sup>o</sup><sup>x</sup> <sup>1</sup>...<sup>14</sup> , <sup>o</sup><sup>y</sup> <sup>1</sup>...<sup>14</sup> give the closest point on any obstacle for the 14 neighboring agents, and g<sup>x</sup>, g<sup>y</sup> is the target location.

*Predator Avoidance.* The CMPC cost function for predator avoidance is given in Eq. (12c). The position, velocity, and the acceleration of the predator are denoted by ppred, vpred, apred, respectively. We take f<sup>p</sup> = 1.40; hence v¯pred = 1.40 v¯ and a¯pred = 1.40 a¯. The input features to the NN are the positions and velocities of agent A<sup>0</sup> and its N -nearest neighbors, and the position and velocity of the predator. The input with 64 features thus has the form [p<sup>x</sup> <sup>0</sup> <sup>p</sup><sup>y</sup> <sup>0</sup> <sup>v</sup><sup>x</sup> <sup>0</sup> <sup>v</sup><sup>y</sup> <sup>0</sup> ... p<sup>x</sup> <sup>14</sup> <sup>p</sup><sup>y</sup> 14 vx <sup>14</sup> <sup>v</sup><sup>y</sup> <sup>14</sup> <sup>p</sup><sup>x</sup> pred <sup>p</sup><sup>y</sup> pred v<sup>x</sup> pred <sup>v</sup><sup>y</sup> pred].

## **5 Experimental Evaluation**

This section contains the results of our extensive performance analysis of the distributed neural flocking controller (DNC), taking into account various control objectives: basic flocking with collision avoidance, obstacle avoidance with target seeking, and predator avoidance. As illustrated in Fig. 1, this involves running CMPC to generate the training data for the DNCs, whose performance we then compare to that of the DMPC and CMPC controllers. We also show that the DNC flocking controllers generalize the behavior seen in the training data to achieve successful collision-free flocking in flocks significantly larger in size than those used during training. Finally, we use Statistical Model Checking to obtain confidence intervals for DNC's correctness/performance.

#### **5.1 Preliminaries**

The CMPC and DMPC control problems defined in Section 2.1 are solved using MATLAB fmincon optimizer. In the training phase, the size of the flock is n = 15. For obstacle-avoidance with target-seeking, we use 5 obstacles with the target located at [60,50]. The simulation time is 100, dt = 0.1 time units, and η = 3, where (recall) η · dt is the control period. Further, the agent velocity and acceleration bounds are ¯v = 2.0 and ¯a = 1.5.

We use dmin = 1.5 as the minimum inter-agent distance for collision avoidance, dobs min = 1 as the minimum agent-obstacle distance for obstacle avoidance, and dpred min = 1.5 as the minimum agent-predator distance for predator avoidance. For initial configurations, recall that agent positions and velocities are uniformly sampled from [−15, 15]<sup>2</sup> and [0, 1]<sup>2</sup>, respectively, and we ensure that they are *recoverable*; i.e., no two agents are so close to each other that they cannot avoid a collision when resorting to maximal accelerations. The predator starts at rest from a fixed location at a distance of 40 from the flock center.

For training, we considered 15 agents and 200 trajectories per agent, each trajectory 1,000 time steps in length. This yielded a total of 3,000,000 training samples. Our neural controller is a fully connected feed-forward Deep Neural Network (DNN), with 5 hidden layers, 84 neurons per hidden layer, and with a ReLU activation function. We use an iterative approach for choosing the DNN hyperparameters and architecture where we continuously improve our NN, until we observe satisfactory performance by the DNC.

For training the DNNs, we use Keras [3], which is a high-level neural network API written in Python and capable of running on top of TensorFlow. To generate the NN model, Keras uses the Adam optimizer [8] with the following settings: lr = 10−<sup>2</sup>, β<sup>1</sup> = 0.9, β<sup>2</sup> = 0.999, = 10−<sup>8</sup>. The batch size (number of samples processed before the model is updated) is 2,000, and the number of epochs (number of complete passes through the training dataset) used for training is 1,000. For measuring training loss, we use the mean-squared error metric.

For basic flocking, DNN input vectors have 60 features and the number of trainable DNN parameters is 33,854. For flocking with obstacle-avoidance and target-seeking, input vectors have 92 features and the number of trainable parameters is 36,542. Finally, for flocking with predator-avoidance, input vectors have 64 features and the resulting number of trainable DNN parameters is 34,190.

To test the trained DNC, we generated 100 simulations (runs) for each of the desired control objectives: basic flocking with collision avoidance, flocking with obstacle avoidance and target seeking, and flocking with predator avoidance. The results presented in Tables 1, were obtained using the same number of agents and obstacles and the same predator as in the training phase. We also ran tests that show DNC controllers can achieve collision-free flocking with obstacle avoidance where the numbers of agents and obstacles are greater than those used during training.

#### **5.2 Results for Basic Flocking**

We use flock diameter, inter-agent collision count and velocity convergence [20] as performance metrics for flocking behavior. At any time step, the *flock diameter* D(**p**) = max(i,j)∈A pij is the largest distance between any two agents in the flock. We calculate the average converged diameter by averaging the flock diameter

Fig. 3: Performance comparison for basic flocking with collision avoidance, averaged over 100 test runs.

in the final time step of the simulation over the 100 runs. An inter-agent collision (IC) occurs when the distance between two agents at any point in time is less than dmin. The IC rate (ICR) is the average number of ICs per test-trajectory timestep. The velocity convergence *VC*(**v**) = (1/n) <sup>i</sup>∈A v<sup>i</sup> <sup>−</sup> ( <sup>n</sup> <sup>j</sup>=1 <sup>v</sup><sup>j</sup> )/n<sup>2</sup> is the average of the squared magnitude of the discrepancy between the velocities of agents and the flock's average velocity. For all the metrics, lower values are better, indicating a denser and more coherent flock with fewer collisions. A successful flocking controller should also ensure that values of D(**p**) and *VC* (**v**) eventually stabilize.

Fig. 3 and Table 1 compare the performance of the DNC on the basic-flocking problem for 15 agents to that of the MPC controllers. Although the DMPC and CMPC outperform the DNC, the difference is marginal. An important advantage of the DNC over DMPC is that they are much faster. Executing a DNC controller requires a modest number of arithmetic operations, whereas executing an MPC controller requires simulation of a model and controller over the prediction horizon. In our experiments, on average, the CMPC takes 1209 msec of CPU time for the entire flock and DMPC takes 58 msec of CPU time per agent, whereas the DNC takes only 1.6 msec.


Table 1: Performance comparison for BF with 15 agents on 100 test runs


Table 2: DNC Performance Generalization for BF

#### **5.3 Results for Obstacle and Predator Avoidance**

For obstacle and predator avoidance, collision rates are used as a performance metric. An obstacle-agent collision (OC) occurs when the distance between an agent and the closest point on any obstacle is less than dobs min. A predator-agent collision (PC) occurs when the distance between an agent and the predator is less than dpred min . The OC rate (OCR) is the average number of OCs per test-trajectory time-step, and the PC rate (PCR) is defined similarly. Our test results show that the DNC, along with the DMPC and CMPC, is collision-free (i.e., each of ICR, OCR, and PCR is zero) for 15 agents, with the exception of DMPC for predator avoidance where PCR = 0.013. We also observed that the flock successfully reaches the target location in all 100 test runs.

#### **5.4 DNC Generalization Results**

Tables 2–3 present DNC generalization results for basic flocking (BF), obstacle avoidance (OA), and predator avoidance (PA), with the number of agents ranging from 15 (the flock size during training) to 40. In all of these experiments, we use a neighborhood size of N = 14, the same as during training. Each controller was evaluated with 100 test runs. The performance metrics in Table 2 are the average converged diameter, convergence rate, average convergence time, and ICR.

The convergence rate is the fraction of successful flocks over 100 runs. The collection of agents is said to have converged to a flock (with collision avoidance) if the value of the global cost function is less than the convergence threshold. We use a convergence threshold of J1(**p**) ≤ 150, which was chosen based on its proximity to the value achieved by CMPC. We use the cost function from Eq. 12a to calculate our success rate because we are showing convergence rate for basic flocking. The average convergence time is the time when the global cost function first drops below the success threshold and remains below it for the rest of the run, averaged over all 100 runs. Even with a local neighborhood of size 14, the results demonstrate that the DNC can successfully generalize to a large number of agents for all of our control objectives.


Table 3: DNC Generalization Performance for OA and PA

#### **5.5 Statistical Model Checking Results**

We use Monte Carlo (MC) approximation as a form of Statistical Model Checking [5, 10] to compute confidence intervals for the DNC's convergence rate to a flock with collision avoidance and for the (normalized) convergence time. The convergence rate is the fraction of successful flocks over N runs. The collection of agent is said to have converged to a successful flock with collision avoidance if the global cost function J1(**p**) ≤ 150, where J1(**p**) is cost function for basic flocking defined in Eq. 12a.

The main idea of MC is to use N random variables, Z1,...,Z<sup>N</sup> , also called samples, IID distributed according to a random variable Z with mean μZ, and to take the sum μ˜<sup>Z</sup> = (Z<sup>1</sup> + ... + Z<sup>N</sup> )/N as the value approximating the mean μZ. Since an exact computation of μ<sup>Z</sup> is almost always intractable, an MC approach is used to compute an (, δ)-approximation of this quantity.

*Additive Approximation* [6] is an (, δ)-approximation scheme where the mean μ<sup>Z</sup> of an RV Z is approximated with absolute error and probability 1 − δ:

$$\Pr\left[\mu\_Z - \epsilon \le \tilde{\mu}\_Z \le \mu\_Z + \epsilon\right] \ge 1 - \delta \tag{13}$$

where μ˜<sup>Z</sup> is an approximation of μZ. An important issue is to determine the number of samples N needed to ensure that μ˜<sup>Z</sup> is an (, δ)-approximation of μZ. If Z is a Bernoulli variable expected to be large, one can use the Chernoff-Hoeffding instantiation of the Bernstein inequality and take N to be N = 4 ln(2/δ)/<sup>2</sup>, as in [6]. This results in the *additive approximation algorithm* [5], defined in Algorithm 1.

We use this algorithm to obtain a joint (, δ)-approximation of the mean convergence rate and mean normalized convergence time for the DNC. Each sample Z<sup>i</sup> is based on the result of an execution obtained by simulating the system starting from a random initial state, and we take Z = (B,R), where B is a Boolean variable indicating whether the agents converged to a flock during the execution, and R is a real value denoting the normalized convergence time. The normalized convergence time is the time when the global cost function first drops below the convergence threshold and remains below it for the rest of the run, measured as a fraction of the total duration of the run. The assumptions

#### **Algorithm 1: Additive Approximation Algorithm**

**Input:** (-, δ) with 0 <-< 1 and 0 <δ< 1 **Input:** Random variables Zi, IID **Output:** μ˜<sup>Z</sup> approximation of μ<sup>Z</sup> N = 4 ln(2/δ)/-2; **for** (i=0; i <sup>≤</sup> <sup>N</sup>; i++) **do** S = S + Zi; μ˜<sup>Z</sup> = S/N; **return** μ˜<sup>Z</sup> ;

Table 4: SMC results for DNC convergence rate and normalized convergence time; = 0.01, δ = 0.0001


about Z required for validity of the additive approximation hold, because RV B is a Bernoulli variable, the convergence rate is expected to be large (i.e., closer to 1 than to 0), and the proportionality constraint of the Bernstein inequality is also satisfied for RV R.

In these experiments, the initial configurations are sampled from the same distributions as in Section 5.1, and we set = 0.01 and δ = 0.0001, to obtain N = 396,140. We perform the required set of N simulations for 15, 20, 25, 30, 35 and 40 agents. Table 4 presents the results, specifically, the (, δ)-approximations μ˜CR and μ˜CT of the mean convergence rate and the mean normalized convergence time, respectively. While the results for the convergence rate are (as expected) numerically similar to the results in Table 2, the results in Table 4 are much stronger, because they come with the guarantee that they are (, δ)-approximations of the actual mean values.

## **6 Related Work**

In [18], a flocking controller is synthesized using multi-agent reinforcement learning (MARL) and natural evolution strategies (NES). The target model from which the system learns is Reynolds flocking model [16]. For training purposes, a list of metrics called *entropy* are chosen, which provide a measure of the collective behavior displayed by the target model. As the authors of [18] observe, this technique does not quite work: although it consistently leads to agents forming recognizable patterns during simulation, agents self-organized into a cluster instead of flowing like a flock.

In [9], reinforcement learning and flocking control are combined for the purpose of predator avoidance, where the learning module determines safe spaces in which the flock can navigate to avoid predators. Their approach to predator avoidance, however, isn't distributed as it requires a majority consensus by the flock to determine its action to avoid predators. They also impose an α-lattice structure [13] on the flock. In contrast, our approach is geometry-agnostic and achieves predator avoidance in a distributed manner.

In [7], an uncertainty-aware reinforcement learning algorithm is developed to estimate the probability of a mobile robot colliding with an obstacle in an unknown environment. Their approach is based on bootstrap neural networks using dropouts, allowing it to process raw sensory inputs. Similarly, a learningbased approach to robot navigation and obstacle avoidance is presented in [14]. They train a model that maps sensor inputs and the target position to motion commands generated by the ROS [15] navigation package. Our work in contrast considers obstacle avoidance (and other control objectives) in a multi-agent flocking scenario under the simplifying assumption of full state observation.

In [4], an approach based on Bayesian inference is proposed that allows an agent in a heterogeneous multi-agent environment to estimate the navigation model and goal of each of its neighbors. It then uses this information to compute a plan that minimizes inter-agent collisions while allowing the agent to reach its goal. Flocking formation is not considered.

## **7 Conclusions**

With the introduction of Neural Flocking (NF), we have shown how machine learning in the form of Supervised Learning can bring many benefits to the flocking problem. As our experimental evaluation confirms, the symmetric and fully distributed neural controllers we derive in this manner are capable of achieving a multitude of flocking-oriented objectives, including flocking formation, inter-agent collision avoidance, obstacle avoidance, predator avoidance, and target seeking. Moreover, NF controllers exhibit real-time performance and generalize the behavior seen in the training data to achieve these objectives in a significantly broader range of scenarios.

Ongoing work aims to determine whether a DNC can perform as well as the centralized MPC controller for agent models that are significantly more realistic than our current point-based model. For this purpose, we are using transfer learning to train a DNC that can achieve acceptable performance on realistic quadrotor dynamics [1], starting from our current point-model-based DNC. This effort also involves extending our current DNC from 2-dimensional to 3-dimensional spatial coordinates. If successful, and preliminary results are encouraging, this line of research will demonstrate that DNCs are capable of achieving flocking with complex realistic dynamics.

For future work, we plan to investigate a distance-based notion of agent neighborhood as opposed to our current nearest-neighbors formulation. Furthermore, motivated by the quadrotor study of [21], we will seek to combine MPC with

reinforcement learning in the framework of guided policy search as an alternative solution technique for the NF problem.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4. 0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **On Well-Founded and Recursive Coalgebras***-*

Jiří Adámek1*,--*, Stefan Milius2*,- - -,*(-) , and Lawrence S. Moss3*,*†

<sup>1</sup> Czech Technical University, Prague, Czech Republic j.adamek@tu-braunschweig.de <sup>2</sup> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany mail@stefan-milius.eu <sup>3</sup> Indiana University, Bloomington, IN, USA lmoss@indiana.edu

**Abstract** This paper studies fundamental questions concerning categorytheoretic models of induction and recursion. We are concerned with the relationship between well-founded and recursive coalgebras for an endofunctor. For monomorphism preserving endofunctors on complete and well-powered categories every coalgebra has a well-founded part, and we provide a new, shorter proof that this is the coreflection in the category of all well-founded coalgebras. We present a new more general proof of Taylor's General Recursion Theorem that every wellfounded coalgebra is recursive, and we study conditions which imply the converse. In addition, we present a new equivalent characterization of well-foundedness: a coalgebra is well-founded iff it admits a coalgebra-toalgebra morphism to the initial algebra.

**Keywords:** Well-founded · Recursive · Coalgebra · Initial Algebra · General Recursion Theorem

## **1 Introduction**

What is induction? What is recursion? In areas of theoretical computer science, the most common answers are related to *initial algebras*. Indeed, the dominant trend in abstract data types is initial algebra semantics (see e.g. [19]), and this approach has spread to other semantically-inclined areas of the subject. The approach in broad slogans is that, for an endofunctor *F* describing the type of algebraic operations of interest, the initial algebra *μF* has the property that for every *F*-algebra *A*, there is a unique homomorphism *μF* → *A*, and this *is* recursion. Perhaps the primary example is *recursion on* N*, the natural numbers*. Recall that N is the initial algebra for the set functor *F X* = *X* + 1. If *A* is any set, and *a* ∈ *A* and *α*: *A* → *A* + 1 are given, then initiality tells us that there is a unique *<sup>f</sup>* : <sup>N</sup> <sup>→</sup> *<sup>A</sup>* such that for all *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>,

$$f(0) = a \qquad f(n+1) = \alpha(f(n)).\tag{1.1}$$

*<sup>-</sup>*A full version of this paper including full proof details is available on arXiv [5].

*<sup>-</sup>-* Supported by the Grant Agency of the Czech Republic under grant 19-00902S. *-*

*<sup>-</sup>-*Supported by Deutsche Forschungsgemeinschaft (DFG) under project MI 717/5-2.

<sup>†</sup> Supported by grant #586136 from the Simons Foundation.

c The Author(s) 2020

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 17–36, 2020. https://doi.org/10.1007/978-3-030-45231-5\_2

Then the first additional problem coming with this approach is that of how to "recognize" initial algebras: Given an algebra, how do we really know if it is initial? The answer – again in slogans – is that initial algebras are the ones with "no junk and no confusion."

Although initiality captures some important aspects of recursion, it cannot be a fully satisfactory approach. One big missing piece concerns recursive definitions based on well-founded relations. For example, the whole study of termination of rewriting systems depends on well-orders, the primary example of *recursion on a well-founded order*. Let (*X, R*) be a well-founded relation, i.e. one with no infinite sequences ··· *x*<sup>2</sup> *R x*<sup>1</sup> *R x*0. Let *A* be any set, and let *α*: P*A* → *A*. (Here and below, P is the power set functor, taking a set to the set of its subsets.) Then there is a unique *f* : *X* → *A* such that for all *x* ∈ *X*,

$$f(x) = \alpha(\{f(y) : y \text{ } R \ x\}).\tag{1.2}$$

The main goal of this paper is the study of concepts that allow one to extend the algebraic spirit behind initiality in (1.1) to the setting of recursion arising from well-foundedness as we find it in (1.2). The corresponding concepts are those of well-founded and recursive coalgebras for an endofunctor, which first appear in work by Osius [22] and Taylor [23, 24], respectively. In his work on categorical set theory, Osius [22] first studied the notions of well-founded and recursive coalgebras (for the power-set functor on sets and, more generally, the power-object functor on an elementary topos). He defined recursive coalgebras as those coalgebras *α*: *A* → P*A* which have a unique coalgebra-to-algebra homomorphism into every algebra (see Definition 3.2).

Taylor [23,24] took Osius' ideas much further. He introduced well-founded coalgebras for a general endofunctor, capturing the notion of a well-founded relation categorically, and considered recursive coalgebras under the name 'coalgebras obeying the recursion scheme'. He then proved the General Recursion Theorem that all well-founded coalgebras are recursive, for every endofunctor on sets (and on more general categories) preserving inverse images. Recursive coalgebras were also investigated by Eppendahl [12], who called them algebra-initial coalgebras. Capretta, Uustalu, and Vene [10] further studied recursive coalgebras, and they showed how to construct new ones from given ones by using comonads. They also explained nicely how recursive coalgebras allow for the semantic treatment of (functional) divide-and-conquer programs. More recently, Jeannin et al. [15] proved the General Recursion Theorem for polynomial functors on the category of many-sorted sets; they also provide many interesting examples of recursive coalgebras arising in programming.

Our contributions in this paper are as follows. We start by recalling some preliminaries in Section 2 and the definition of (parametrically) recursive coalgebras in Section 3 and of well-founded coalgebras in Section 4 (using a formulation based on Jacobs' next time operator [14], which we extend from Kripke polynomial set functors to arbitrary functors). We show that every coalgebra for a monomorphism preserving functor on a complete and well-powered category has a well-founded part, and provide a new proof that this is the coreflection in the

category of well-founded coalgebras (Proposition 4.19), shortening our previous proof [6]. Next we provide a new proof of Taylor's General Recursion Theorem (Theorem 5.1), generalizing this to endofunctors preserving monomorphisms on a complete and well-powered category having smooth monomorphisms (see Definition 2.8). For the category of sets, this implies that "well-founded ⇒ recursive" holds for all endofunctors, strengthening Taylor's result. We then discuss the converse: is every recursive coalgebra well-founded? Here the assumption that *F* preserves inverse images cannot be lifted, and one needs additional assumptions. In fact, we present two results: one assumes universally smooth monomorphisms and that the functor has a pre-fixed point (see Theorem 5.5). Under these assumptions we also give a new equivalent characterization of recursiveness and well-foundedness: a coalgebra is recursive if it has a coalgebra-to-algebra morphism into the initial algebra (which exists under our assumptions), see Corollary 5.6. This characterization was previously established for finitary functors on sets [3]. The other converse of the above implication is due to Taylor using the concept of a subobject classifier (Theorem 5.8). It implies that 'recursive' and 'well-founded' are equivalent concepts for all set functors preserving inverse images. We also prove that a similar result holds for the category of vector spaces over a fixed field (Theorem 5.12).

Finally, we show in Section 6 that well-founded coalgebras are closed under coproducts, quotients and, assuming mild assumptions, under subcoalgebras.

## **2 Preliminaries**

We start by recalling some background material. Except for the definitions of *algebra* and *coalgebra* in Subsection 2.1, the subsections below may be read as needed. We assume that readers are familiar with notions of basic category theory; see e.g. [2] for everything which we do not detail. We indicate monomorphisms by writing and strong epimorphisms by .

**2.1 Algebras and Coalgebras.** We are concerned throughout this paper with *algebras* and *coalgebras* for an endofunctor. This means that we have an underlying category, usually written A ; frequently it is the category of sets or of vector spaces over a fixed field, and that a functor *F* : A → A is given. An *F-algebra* is a pair (*A, α*), where *α*: *F A* → *A*. An *F-coalgebra* is a pair (*A, α*), where *α*: *A* → *F A*. We usually drop the functor *F*. Given two algebras (*A, α*) and (*B, β*), an *algebra homomorphism* from the first to the second is *h*: *A* → *B* in A such that *h* · *α* = *β* · *F h*. Similarly, a *coalgebra homomorphism* satisfies *β* · *h* = *F h* · *α*. We denote by Coalg *F* the category of all coalgebras for *F*.

**Example 2.1.** (1) The power set functor P : Set → Set takes a set *X* to the set P*X* of all subsets of it; for a morphism *f* : *X* → *Y* , P*f* : P*X* → P*Y* takes a subset *S* ⊆ *X* to its direct image *f*[*S*]. Coalgebras *α*: *X* → P*X* may be identified with directed graphs on the set *X* of vertices, and the coalgebra structure *α* describes the edges: *b* ∈ *α*(*a*) means that there is an edge *a* → *b* in the graph.

(2) Let *Σ* be a signature, i.e. a set of operation symbols, each with a finite arity. The *polynomial functor H<sup>Σ</sup>* associated to *Σ* assigns to a set *X* the set

$$H\_{\Sigma}X = \coprod\_{n \in \mathbb{N}} \Sigma\_n \times X^n,$$

where *Σ<sup>n</sup>* is the set of operation symbols of arity *n*. This may be identified with the set of all terms *σ*(*x*1*,...,xn*), for *σ* ∈ *Σn*, and *x*1*,...,x<sup>n</sup>* ∈ *X*. Algebras for *H<sup>Σ</sup>* are the usual *Σ*-algebras.

(3) Deterministic automata over an input alphabet *Σ* are coalgebras for the functor *F X* <sup>=</sup> {0*,* <sup>1</sup>} × *<sup>X</sup><sup>Σ</sup>*. Indeed, given a set *<sup>S</sup>* of states, a next-state map *<sup>S</sup>* <sup>×</sup> *<sup>Σ</sup>* <sup>→</sup> *<sup>S</sup>* may be curried to *<sup>δ</sup>* : *<sup>S</sup>* <sup>→</sup> *<sup>S</sup><sup>Σ</sup>*. The set of final states yields the acceptance predicate *a*: *S* → {0*,* 1}. So an automaton may be regarded as a coalgebra *a, δ*: *<sup>S</sup>* → {0*,* <sup>1</sup>} × *<sup>S</sup><sup>Σ</sup>*.

(4) Labelled transitions systems are coalgebras for *F X* = P(*Σ* × *X*).

(5) To describe linear weighted automata, i.e. weighted automata over the input alphabet *Σ* with weights in a field *K*, as coalgebras, one works with the category Vec*<sup>K</sup>* of vector spaces over *K*. A linear weighted automaton is then a coalgebra for *F X* <sup>=</sup> *<sup>K</sup>* <sup>×</sup> *<sup>X</sup><sup>Σ</sup>*.

**2.2 Preservation Properties.** Recall that an intersection of two subobjects *s<sup>i</sup>* : *S<sup>i</sup>* - *A* (*i* = 1*,* 2) of a given object *A* is given by their pullback. Analogously, (general) intersections are given by wide pullbacks. Furthermore, the inverse image of a subobject *s*: *S* - *B* under a morphism *f* : *A* → *B* is the subobject *t*: *T* -*A* obtained by a pullback of *s* along *f*.

All of the 'usual' set functors preserve intersections and inverse images:

**Example 2.2.** (1) Every polynomial functor preserves intersections and inverse images.

(2) The power-set functor P preserves intersections and inverse images.

(3) Intersection-preserving set functors are closed under taking coproducts, products and composition. Similarly, for inverse images.

(4) Consider next the set functor *R* defined by *RX* = {(*x, y*) ∈ *X* × *X* : *x* = *y*} + {*d*} for sets *X*. For a function *f* : *X* → *Y* put *Rf*(*x, y*)=(*f*(*x*)*, f*(*y* if *f*(*x*) = *f*(*y*), and *d* otherwise. *R* preserves intersections but not inverse images. ))

**Proposition 2.3 [27].** *For every set functor F there exists an essentially unique set functor F*¯ *which coincides with F on nonempty sets and functions and preserves finite intersections (whence monomorphisms).*

**Remark <sup>2</sup>.4.** (1) In fact, Trnková gave a construction of *<sup>F</sup>*¯: she defined *<sup>F</sup>*¯<sup>∅</sup> as the set of all natural transformations *C*<sup>01</sup> → *F*, where *C*<sup>01</sup> is the set functor with *C*01∅ = ∅ and *C*01*X* = 1 for all nonempty sets *X*. For the empty map *e*: ∅ → *X* with *<sup>X</sup>* <sup>=</sup> <sup>∅</sup>, *F e*¯ maps a natural transformation *<sup>τ</sup>* : *<sup>C</sup>*<sup>01</sup> <sup>→</sup> *<sup>F</sup>* to the element given by *τ<sup>X</sup>* : 1 → *F X*.

(2) The above functor *F*¯ is called the *Trnková hull* of *F*. It allows us to achieve preservation of intersections for all *finitary* set functors. Intuitively, a functor on sets is finitary if its behavior is completely determined by its action on *finite* sets and functions. For a general functor, this intuition is captured by requiring that the functor preserves filtered colimits [8]. For a set functor *F* this is equivalent to being *finitely bounded*, which is the following condition: for each element *x* ∈ *F X* there exists a finite subset *M* ⊆ *X* such that *x* ∈ *F i*[*FM*], where *i*: *M* → *X* is the inclusion map [7, Rem. 3.14].

**Proposition 2.5 [4, p. 66].** *The Trnková hull of a finitary set functor preserves all intersections.*

**2.3 Factorizations.** Recall that an epimorphism *e*: *A* → *B* is called *strong* if it satisfies the following *diagonal fill-in property*: given a monomorphism *m*: *C* - *D* and morphisms *f* : *A* → *C* and *g* : *B* → *D* such that *m* · *f* = *g* · *e* then there exists a unique *d*: *B* → *C* such that *f* = *d* · *e* and *g* = *m* · *d*.

Every complete and well-powered category has factorizations of morphisms: every morphism *f* may be written as *f* = *m* · *e*, where *e* is a strong epimorphism and *m* is a monomorphism [9, Prop. 4.4.3]. We call the subobject *m* the *image* of *f*. It follows from a result in Kurz' thesis [16, Prop. 1.3.6] that factorizations of morphisms lift to coalgebras:

**Proposition 2.6 (**Coalg *F* **inherits factorizations from** A **).** *Suppose that F preserves monomorphisms. Then the category* Coalg *F has factorizations of homomorphisms f as f* = *m* · *e, where e is carried by a strong epimorphism and m by a monomorphism in* A *. The diagonal fill-in property holds in* Coalg *F.*

**Remark 2.7.** By a *subcoalgebra* of a coalgebra (*A, α*) we mean a subobject in Coalg *F* represented by a homomorphism *m*: (*B, β*) - (*A, α*), where *m* is monic in A . Similarly, by a *strong quotient* of a coalgebra (*A, α*) we mean one represented by a homomorphism *e*: (*A, α*) (*C, γ*) with *e* strongly epic in A .

**2.4 Chains.** By a *transfinite chain* in a category A we understand a functor from the ordered class Ord of all ordinals into A . Moreover, for an ordinal *λ*, a *λ-chain* in A is a functor from *λ* to A . A category *has colimits of chains* if for every ordinal *λ* it has a colimit of every *λ*-chain. This includes the initial object 0 (the case *λ* = 0).

**Definition 2.8.** (1) A category A has *smooth monomorphisms* if for every *λ*-chain *C* of monomorphisms a colimit exists, its colimit cocone is formed by monomorphisms, and for every cone of *C* formed by monomorphisms, the factorizing morphism from colim *C* is monic. In particuar, every morphism from 0 is monic.

(2) A has *universally smooth monomorphisms* if A also has pullbacks, and for every morphism *f* : *X* → colim *C*, the functor A */* colim *C* → A */X* forming pullbacks along *f* preserves the colimit of *C*. This implies that initial object 0 is *strict*, i.e. every morphism *f* : *X* → 0 is an isomorphism. Indeed, consider the empty chain (*λ* = 0).

**Example 2.9.** (1) Set has universally smooth monomorphisms.

(2) Vec*<sup>K</sup>* has smooth monomorphisms, but not universally so because the initial object is not strict.

(3) Categories in which colimits of chains and pullbacks are formed "set-like" have universally smooth monomorphisms. These include the categories of posets, graphs, topological spaces, presheaf categories, and many varieties, such as monoids, groups, and unary algebras.

(4) Every locally finitely presentable category A with a strict initial object (see Remark 2.12(1)) has smooth monomorphisms. This follows from [8, Prop. 1.62]. Moreover, since pullbacks commute with colimits of chains, it is easy to prove that colimits of chains are universal using the strictness of 0.

(5) The category CPO of complete partial orders does not have smooth monomorphisms. Indeed, consider the *ω*-chain of linearly ordered sets *A<sup>n</sup>* = {0*,...,n*}+ { } ( a top element) with inclusion maps *A<sup>n</sup>* → *An*+1. Its colimit is the linearly ordered set <sup>N</sup> <sup>+</sup>{ *,* } of natural numbers with two added top elements *<* . For the sub-cpo <sup>N</sup> <sup>+</sup> { }, the inclusions of *A<sup>n</sup>* are monic and form a cocone. But the unique factorizing morphism from the colimit is not monic.

**Notation 2.10.** For every object *A* we denote by Sub(*A*) the poset of all subobjects of *A* (represented by monomorphisms *s*: *S* - *A*), where *s* ≤ *s* if there exists *i* with *s* = *s* · *i*. If A has pullbacks we have, for every morphism *f* : *A* → *B*, the *inverse image operator*, viz. the monotone map ←−*<sup>f</sup>* : Sub(*B*) <sup>→</sup> Sub(*A*) assigning to a subobject *s*: *S* - *A* the subobject of *B* obtained by forming the inverse image of *s* under *f*, i.e. the pullback of *s* along *f*.

**Lemma <sup>2</sup>.11.** *If* <sup>A</sup> *is complete and well-powered, then* ←−*<sup>f</sup> has a left adjoint given by the* (direct) image operator −→*<sup>f</sup>* : Sub(*A*) <sup>→</sup> Sub(*B*)*. It maps a subobject t*: *T* - *<sup>B</sup> to the subobject of <sup>A</sup> given by the image of <sup>f</sup>* · *<sup>t</sup>; in symbols we have* −→*<sup>f</sup>* (*t*) <sup>≤</sup> *<sup>s</sup> iff <sup>t</sup>* <sup>≤</sup> ←−*f* (*s*)*.*

**Remark 2.12.** If A is a complete and well-powered category, then Sub(*A*) is a complete lattice. Now suppose that A has smooth monomorphisms.

(1) In this setting, the unique morphism ⊥*<sup>A</sup>* : 0 → *A* is a monomorphism and therefore is the bottom element of the poset Sub(*A*).

(2) Furthermore, a join of a chain in Sub(*A*) is obtained by forming a colimit, in the obvious way.

(3) If A has universally smooth monomorphisms, then for every morphism *<sup>f</sup>* : *<sup>A</sup>* <sup>→</sup> *<sup>B</sup>*, the operator ←−*<sup>f</sup>* : Sub(*B*) <sup>→</sup> Sub(*A*) preserves unions of chains.

**Remark 2.13.** Recall [1] that every endofunctor *F* yields the *initial-algebra chain*, viz. a transfinite chain formed by the objects *F<sup>i</sup>* 0 of A , as follows: *F*<sup>0</sup>0=0, the initial object; *F<sup>i</sup>*+10 = *F*(*F<sup>i</sup>* 0), and for a limit ordinal *i* we take the colimit of the chain (*F<sup>j</sup>*0)*j<i*. The connecting morphisms *wi,j* : *F<sup>i</sup>* <sup>0</sup> <sup>→</sup> *<sup>F</sup><sup>j</sup>*<sup>0</sup> are defined by a similar transfinite recursion.

## **3 Recursive Coalgebras**

**Assumption 3.1.** We work with a standard set theory (e.g. Zermelo-Fraenkel), assuming the Axiom of Choice. In particular, we use transfinite induction on several occasions. (We are not concerned with constructive foundations in this paper.)

Throughout this paper we assume that A is a complete and well-powered category A and that *F* : A → A preserves monomorphisms.

For A = Set the condition that *F* preserves monomorphisms may be dropped. In fact, preservation of non-empty monomorphism is sufficient in general (for a suitable notion of non-empty monomorphism) [21, Lemma 2.5], and this holds for every set functor.

The following definition of recursive coalgebras was first given by Osius [22]. Taylor [24] speaks of *coalgebras obeying the recursion scheme*. Capretta et al. [10] extended the concept to *parametrically recursive* coalgebra by dualizing completely iterative algebras [20].

**Definition 3.2.** A coalgebra *α*: *A* → *F A* is called *recursive* if for every algebra *e*: *F X* → *X* there exists a unique coalgebra-to-algebra morphism *e*† : *A* → *X*, i.e. a unique morphism such that the square on the left below commutes:

$$\begin{array}{c} A \xrightarrow{e^{\dagger}} \begin{array}{c} X \\ \uparrow \\ FA \xrightarrow{Fe^{\dagger}} \end{array} \begin{array}{c} X \\ \end{array} \qquad \begin{array}{c} A \\ \xrightarrow{\langle \alpha, A \rangle} \begin{array}{c} \xrightarrow{e^{\dagger}} X \\ \uparrow \\ FA \times A \end{array} \xrightarrow{F e^{\dagger} \times A} FX \times A \end{array}$$

(*A, α*) is called *parametrically recursive* if for every morphism *e*: *F X* × *A* → *X* there is a unique morphism *e*† : *A* → *X* such that the square on the right above commutes.

**Example 3.3.** (1) A graph regarded as a coalgebra for P is recursive iff it has no infinite path. This is an immediate consequence of the General Recursion Theorem (see Corollary 5.6 and Example 4.5(2)).

(2) Let *ι*: *F*(*μF*) → *μF* be an initial algebra. By Lambek's Lemma, *ι* is an isomorphism. So we have a coalgebra *ι* <sup>−</sup><sup>1</sup> : *μF* <sup>→</sup> *<sup>F</sup>*(*μF*). This algebra is (parametrically) recursive. By [20, Thm. 2.8], in dual form, this is precisely the same as the terminal parametrically recursive coalgebra (see also [10, Prop. 7]).

(3) The initial coalgebra 0 → *F*0 is recursive.

(4) If (*C, γ*) is recursive so is (*F C, F γ*), see [10, Prop. 6].

(5) Colimits of recursive coalgebras in Coalg *F* are recursive. This is easy to prove, using that colimits of coalgebras are formed on the level of the underlying category.

(6) It follows from items (3)–(5) that in the initial-algebra chain from Remark 2.13 all coalgebras *wi,i*+1 : *F<sup>i</sup>* <sup>0</sup> <sup>→</sup> *<sup>F</sup><sup>i</sup>*+10, *<sup>i</sup>* <sup>∈</sup> Ord, are recursive.

(7) Every parametrically recursive coalgebra is recursive. (To see this, form for a given *e*: *F X* → *X* the morphism *e* = *e* · *π*, where *π* : *F X* × *A* → *F X* is the projection.) In Corollaries 5.6 and 5.9 we will see that the converse often holds.

algebra *α*: *RA* → *A* the constant map *h*: *C* → *A* with *h*(0) = *h*(1) = *α*(*d*) is the unique coalgebra-to-algebra morphism. Here is an example where the converse fails [ ]. Let *R*: Set → Set be the functor defined in Example 2.2(4). Also, let *C* = {0*,* 1}, and define *γ* : *C* → *RC* by *γ*(0) = *γ*(1) = (0*,* 1). Then (*C, γ*) is a recursive coalgebra. Indeed, for every 3

However, (*C, γ*) is not parametrically recursive. To see this, consider any morphism *e*: *RX* × {0*,* 1} → *X* such that *RX* contains more than one pair (*x*0*, x*1), *x*<sup>0</sup> = *x*<sup>1</sup> with *e*((*x*0*, x*1)*, i*) = *x<sup>i</sup>* for *i* = 0*,* 1. Then each such pair yields *h*: *C* → *X* with *h*(*i*) = *x<sup>i</sup>* making the appropriate square commutative. Thus, (*C, γ*) is not parametrically recursive.

(8) Capretta et al. [11] showed that recursivity semantically models divide-andconquer programs, as demonstrated by the example of Quicksort. For every linearly ordered set *A* (of data elements), Quicksort is usually defined as the recursive function *q* : *A*<sup>∗</sup> → *A*<sup>∗</sup> given by

$$q(\varepsilon) = \varepsilon \qquad \text{and} \qquad q(aw) = q(w\_{\le a}) \star (aq(w\_{> a})),$$

where *A*<sup>∗</sup> is the set of all lists on *A*, *ε* is the empty list, is the concatenation of lists and *w*≤*<sup>a</sup>* denotes the list of those elements of *w* which are less than or equal than *a*; analogously for *w>a*.

Now consider the functor *F X* =1+ *A* × *X* × *X* on Set, where 1 = {•}, and form the coalgebra *s*: *A*<sup>∗</sup> → 1 + *A* × *A*<sup>∗</sup> × *A*<sup>∗</sup> given by

$$s(\varepsilon) = \bullet \qquad \text{and} \qquad s(aw) = (a, w\_{\le a}, w\_{>a}) \qquad \text{for } a \in A \text{ and } w \in A^\*.$$

We shall see that this coalgebra is recursive in Example 5.3. Thus, for the *F*-algebra *m* :1+ *A* × *A*<sup>∗</sup> × *A*<sup>∗</sup> → *A*<sup>∗</sup> given by

$$m(\bullet) = \varepsilon \qquad \text{and} \qquad m(a, w, v) = w \star (av).$$

there exists a unique function *q* on *A*<sup>∗</sup> such that *q* = *m* · *F q* · *s*. Notice that the last equation reflects the idea that Quicksort is a divide-and-conquer algorithm. The coalgebra structure *s* divides a list into two parts *w*≤*<sup>a</sup>* and *w>a*. Then *F q* sorts these two smaller lists, and finally in the combine- (or conquer-) step, the algebra structure *m* merges the two sorted parts to obtain the desired whole sorted list.

Jeannin et al. [15, Sec. 4] provide a number of recursive functions arising in programming that are determined by recursivity of a coalgebra, e.g. the gcd of integers, the Ackermann function, and the Towers of Hanoi.

## **4 The Next Time Operator and Well-Founded Coalgebras**

As we have mentioned in the Introduction, the main issue of this paper is the relationship between two concepts pertaining to coalgebras: recursiveness and well-foundedness. The concept of well-foundedness is well-known for directed graphs (*G,* →): it means that there are no infinite directed paths *g*<sup>0</sup> → *g*<sup>1</sup> →··· . For a set *X* with a relation *R*, well-foundedness means that there are no *backwards* sequences ··· *R x*<sup>2</sup> *R x*<sup>1</sup> *R x*0, i.e. the converse of the relation is well-founded as a graph. Taylor [24, Def. 6.2.3] gave a more general category theoretic formulation of well-foundedness. We observe here that his definition can be presented in a compact way, by using an operator that generalizes the way one thinks of the semantics of the 'next time' operator of temporal logics for non-deterministic (or even probabilistic) automata and transitions systems. It is also strongly related to the algebraic semantics of modal logic, where one passes from a graph *G* to a function on P*G*. Jacobs [14] defined and studied the 'next time' operator on coalgebras for Kripke polynomial set functors. This can be generalized to arbitrary functors as follows.

Recall that Sub(*A*) denotes the complete lattice of subobjects of *A*.

**Definition 4.1 [4, Def. 8.9].** Every coalgebra *α*: *A* → *F A* induces an endofunction on Sub(*A*), called the *next time operator*

$$
\bigcirc \colon \mathsf{Sub}(A) \to \mathsf{Sub}(A), \qquad \bigcirc(s) = \overleftarrow{\alpha}(Fs) \quad \text{for } s \in \mathsf{Sub}(A).
$$

In more detail: we define *s* and *α*(*s*) by the pullback in (4.1). (Being a pullback

is indicated by the "corner" symbol.) In words, assigns to each subobject *s*: *S* - *A* the inverse image of *F s* under *α*. Since *F s* is a monomorphism, *s* is a monomorphism and *α*(*s*) is (for every representation *s* of that subobject of *A*) uniquely determined.

**Example 4.2.** (1) Let *A* be a graph, considered as a coalgebra for P : Set → Set. If *S* ⊆ *A* is a set of vertices, then *S* is the set of vertices all of whose successors belong to *S*.

(2) For the set functor *F X* = P(*Σ* × *X*) expressing labelled transition systems the operator for a coalgebra *α*: *A* → P(*Σ* × *A*) is the semantic counterpart of the next time operator of classical linear temporal logic, see e.g. Manna and Pnüeli [18]. In fact, for a subset *S* → *A* we have that *S* consists of those states all of whose next states lie in *S*, in symbols:

$$
\bigcap S = \{ x \in A \mid (s, y) \in \alpha(x) \text{ implies } y \in S, \text{ for all } s \in \Sigma \}.
$$

The next time operator allows a compact definition of well-foundedness as characterized by Taylor [24, Exercise VI.17] (see also [6, Corollary 2.19]):

**Definition 4.3.** A coalgebra is *well-founded* if *id<sup>A</sup>* is the only fixed point of its next time operator.

**Remark 4.4.** (1) Let us call a subcoalgebra *m*: (*B, β*) - (*A, α*) *cartesian* provided that the square (4.2) is a pullback. Then (*A, α*) is well-founded iff it has no proper cartesian subcoalgebra. That is, if *m*: (*B, β*) - (*A, α*) is a cartesian subcoalgebra, then *m* is an isomorphism. Indeed, the fixed points of next time are precisely the *B FB A FA β m Fm α* (4.2) cartesian subcoalgebras.

(2) A coalgebra is well-founded iff has a unique pre-fixed point *m* ≤ *m*. Indeed, since Sub(*A*) is a complete lattice, the least fixed point of a monotone map is its least pre-fixed point. Taylor's definition [24, Def. 6.3.2] uses that property: he calls a coalgebra well-founded iff has no proper subobject as a pre-fixed point.

**Example 4.5.** (1) Consider a graph as a coalgebra *α*: *A* → P*A* for the powerset functor (see Example 2.1). A subcoalgebra is a subset *m*: *B* → *A* such that with every vertex *v* it contains all neighbors of *v*. The coalgebra structure *β* : *B* → P*B* is then the domain-codomain restriction of *α*. To say that *B* is a cartesian subcoalgebra means that whenever a vertex of *A* has all neighbors in *B*, it also lies in *B*. It follows that (*A, α*) is well-founded iff it has no infinite directed path, see [24, Example 6.3.3].

(2) If *μF* exists, then as a coalgebra it is well-founded. Indeed, in every pullback (4.2), since *ι* <sup>−</sup><sup>1</sup> (as *α*) is invertible, so is *β*. The unique algebra homomorphism from *μF* to the algebra *<sup>β</sup>*−<sup>1</sup> : *F B* <sup>→</sup> *<sup>B</sup>* is clearly inverse to *<sup>m</sup>*.

(3) If a set functor *F* fulfils *F*∅ = ∅, then the only well-founded coalgebra is the empty one. Indeed, this follows from the fact that the empty coalgebra is a fixed point of . For example, a deterministic automaton over the input alphabet *Σ*, as a coalgebra for *F X* <sup>=</sup> {0*,* <sup>1</sup>} × *<sup>X</sup><sup>Σ</sup>*, is well-founded iff it is empty.

(4) A non-deterministic automaton may be considered as a coalgebra for the set functor *F X* <sup>=</sup> {0*,* <sup>1</sup>} × (P*X*)*<sup>Σ</sup>*. It is well-founded iff the state transition graph is well-founded (i.e. has no infinite path). This follows from Corollary 4.10 below. (5) A linear weighted automaton, i.e. a coalgebra for *F X* <sup>=</sup> *<sup>K</sup>* <sup>×</sup> *<sup>X</sup><sup>Σ</sup>* on Vec*K*, is well-founded iff every path in its state transition graph eventually leads to 0. This means that every path starting in a given state leads to the state 0 after finitely many steps (where it stays).

**Notation 4.6.** Given a set functor *F*, we define for every set *X* the map *τ<sup>X</sup>* : *F X* → P*X* assigning to every element *x* ∈ *F X* the intersection of all subsets *m*: *M* → *X* such that *x* lies in the image of *Fm*:

$$\tau\_X(x) = \bigcap \{ m \mid m \colon M \hookrightarrow X \text{ satisfies } x \in Fm[FM] \}. \tag{4.3}$$

Recall that a functor *preserves intersections* if it preserves (wide) pullbacks of families of monomorphisms.

Gumm [13, Thm. 7.3] observed that for a set functor preserving intersections, the maps *τ<sup>X</sup>* : *F X* → P*X* in (4.3) form a "subnatural" transformation from *F* to the power-set functor P. Subnaturality means that (although these maps do not form a natural transformation in general) for every monomorphism *i*: *X* → *Y* we have a commutative square:

$$\begin{array}{c} FX \stackrel{\tau\_X}{\longrightarrow} \mathcal{P}X\\ \stackrel{\tau\_{Fi}}{\longrightarrow} \mathcal{Y}\_{\mathcal{P}i} \\ FY \stackrel{\tau\_Y}{\longrightarrow} \mathcal{P}Y \end{array} \tag{4.4}$$

**Remark 4.7.** As shown in [13, Thm. 7.4] and [23, Prop. 7.5], a set functor *F* preserves intersections iff the squares in (4.4) above are pullbacks. Moreover, *loc. cit.* and [13, Thm. 8.1] prove that *τ* : *F* → P is a natural transformation, provided *F* preserves inverse images and intersections.

**Definition 4.8.** Let *F* be a set functor. For every coalgebra *α*: *A* → *F A* its *canonical graph* is the following coalgebra for P: *A <sup>α</sup>* −→ *F A <sup>τ</sup><sup>A</sup>* −−→ <sup>P</sup>*A.*

Thanks to the subnaturality of *τ* one obtains the following results.

**Proposition 4.9.** *For every set functor F preserving intersections, the next time operator of a coalgebra* (*A, α*) *coincides with that of its canonical graph.*

**Corollary 4.10 [24, Rem. 6.3.4].** *A coalgebra for a set functor preserving intersections is well-founded iff its canonical graph is well-founded.*

**Example 4.11.** (1) For a (deterministic or non-deterministic) automaton, the canonical graph has an edge from *s* to *t* iff there is a transition from *s* to *t* for some input letter. Thus, we obtain the characterization of well-foundedness as stated in Example 4.5(3) and (4).

(2) Every polynomial functor *H<sup>Σ</sup>* : Set → Set preserves intersections. Thus, a coalgebra (*A, α*) is well-founded if there are no infinite paths in its canonical graph. The canonical graph of *A* has an edge from *a* to *b* if *α*(*a*) is of the form *σ*(*c*1*,...,cn*) for some *σ* ∈ *Σ<sup>n</sup>* and if *b* is one of the *ci*'s.

(3) Thus, for the functor *F X* = 1+ *A* × *X* × *X*, the coalgebra (*A*∗*, s*) of Example 3.3(8) is easily seen to be well-founded via its canonical graph. Indeed, this graph has for every list *w* one outgoing edge to the list *w*≤*<sup>a</sup>* and one to *w>a* for every *a* ∈ *A*. Hence, this is a well-founded graph.

**Lemma 4.12.** *The next time operator is monotone: if m* ≤ *n, then m* ≤ *n.*

**Lemma <sup>4</sup>.13.** *Let <sup>α</sup>*: *<sup>A</sup>* <sup>→</sup> *F A be a coalgebra and <sup>m</sup>*: *<sup>B</sup>* -*A a subobject.*

(1) *There is a coalgebra structure β* : *B* → *F B for which m gives a subcoalgebra of* (*A, α*) *iff m* ≤ *m.*

(2) *There is a coalgebra structure β* : *B* → *F B for which m gives a cartesian subcoalgebra of* (*A, α*) *iff m* = *m.*

**Lemma 4.14.** *For every coalgebra homomorphism f* : (*B, β*) → (*A, α*) *we have*

$$
\bigcirc\_{\beta} \cdot \overleftarrow{f} \le \overleftarrow{f} \cdot \bigcirc\_{\alpha},
$$

*where <sup>α</sup> and <sup>β</sup> denote the next time operators of the coalgebras* (*A, α*) *and* (*B, β*)*, respectively, and* ≤ *is the pointwise order.*

**Corollary 4.15.** *For every coalgebra homomorphism f* : (*B, β*) → (*A, α*) *we have <sup>β</sup>* · ←−*<sup>f</sup>* <sup>=</sup> ←−*<sup>f</sup>* · *α, provided that either*


**Definition 4.16 [4].** The *well-founded part* of a coalgebra is its largest wellfounded subcoalgebra.

The well-founded part of a coalgebra always exists and is the coreflection in the category of well-founded coalgebras [6, Prop. 2.27]. We provide a new, shorter proof of this fact. The well-founded part is obtained by the following:

**Construction 4.17 [6, Not. 2.22].** Let *α*: *A* → *F A* be a coalgebra. We know that Sub(*A*) is a complete lattice and that the next time operator is monotone (see Lemma 4.12). Hence, by the Knaster-Tarski fixed point theorem, has a least fixed point, which we denote by *a*<sup>∗</sup> : *A*<sup>∗</sup> -*A*.

By Lemma 4.13(2), we know that there is a coalgebra structure *α*<sup>∗</sup> : *A*<sup>∗</sup> → *F A*<sup>∗</sup> so that *a*<sup>∗</sup> : (*A*∗*, α*∗) -(*A, α*) is the smallest cartesian subcoalgebra of (*A, α*).

**Proposition 4.18.** *For every coalgebra* (*A, α*)*, the coalgebra* (*A*∗*, α*∗) *is wellfounded.*

*Proof.* Let *m*: (*B, β*) - (*A*∗*, α*∗) be a cartesian subcoalgebra. By Lemma 4.13, *a*<sup>∗</sup> · *m*: *B* → *A* is a fixed point of . Since *a*<sup>∗</sup> is the least fixed point, we have *<sup>a</sup>*<sup>∗</sup> <sup>≤</sup> *<sup>a</sup>*<sup>∗</sup> · *<sup>m</sup>*, i.e. *<sup>a</sup>*<sup>∗</sup> <sup>=</sup> *<sup>a</sup>*<sup>∗</sup> · *<sup>m</sup>* · *<sup>x</sup>* for some *<sup>x</sup>*: *<sup>A</sup>*<sup>∗</sup> - *B*. Since *a*<sup>∗</sup> is monic, we thus have *<sup>m</sup>* · *<sup>x</sup>* <sup>=</sup> *id<sup>A</sup>*<sup>∗</sup> . So *<sup>m</sup>* is a monomorphism and a split epimorphism, whence an isomorphism.

**Proposition 4.19.** *The full subcategory of* Coalg *F given by well-founded coalgebras is coreflective. In fact, the well-founded coreflection of a coalgebra* (*A, α*) *is its well-founded part a*<sup>∗</sup> : (*A*∗*, α*∗) -(*A, α*)*.*

*Proof.* We are to prove that for every coalgebra homomorphism *f* : (*B, β*) → (*A, α*), where (*B, β*) is well-founded, there exists a coalgebra homomorphism *<sup>f</sup>* : (*B, β*) <sup>→</sup> (*A*∗*, α*∗) such that *<sup>a</sup>*<sup>∗</sup> · *<sup>f</sup>* <sup>=</sup> *<sup>f</sup>*. The uniqueness is easy.

For the existence of *f*, we first observe that ←−*f* (*a*∗) is a pre-fixed point of *β*: indeed, using Lemma 4.14 we have *β*( ←−*<sup>f</sup>* (*a*∗)) <sup>≤</sup> ←−*<sup>f</sup>* (*α*(*a*∗)) = ←−*<sup>f</sup>* (*a*∗)*.* By Remark 4.4(2), we therefore have *id<sup>B</sup>* = *b*<sup>∗</sup> ≤ ←−*f* (*a*∗) in Sub(*B*). Using the adjunction of Lemma <sup>2</sup>.11, we have −→*<sup>f</sup>* (*idB*) <sup>≤</sup> *<sup>a</sup>*<sup>∗</sup> in Sub(*A*). Now factorize *<sup>f</sup>* as *<sup>B</sup> <sup>e</sup> <sup>C</sup> <sup>m</sup>* - *<sup>A</sup>*. We have −→*<sup>f</sup>* (*idB*) = *<sup>m</sup>*, and we then obtain *<sup>m</sup>* <sup>=</sup> −→*<sup>f</sup>* (*idB*) <sup>≤</sup> *<sup>a</sup>*∗*,* i.e. there exists a morphism *h*: *C* - *<sup>A</sup>*<sup>∗</sup> such that *<sup>a</sup>*<sup>∗</sup> · *<sup>h</sup>* <sup>=</sup> *<sup>m</sup>*. Thus, *<sup>f</sup>* <sup>=</sup> *<sup>h</sup>* · *<sup>e</sup>*: *<sup>B</sup>* <sup>→</sup> *<sup>A</sup>*<sup>∗</sup> is a morphism satisfying *<sup>a</sup>*<sup>∗</sup> · *<sup>f</sup>* <sup>=</sup> *<sup>a</sup>*<sup>∗</sup> · *<sup>h</sup>* · *<sup>e</sup>* <sup>=</sup> *<sup>m</sup>* · *<sup>e</sup>* <sup>=</sup> *<sup>f</sup>*. It follows that *f* is a coalgebra homomorphism from (*B, β*) to (*A*∗*, α*∗) since *f* and *a*<sup>∗</sup> are and *F* preserves monomorphisms.

**Construction 4.20 [6, Not. 2.22].** Let (*A, α*) be a coalgebra. We obtain *a*∗, the least fixed point of , as the join of the following transfinite chain of subobjects *a<sup>i</sup>* : *A<sup>i</sup>* - *A*, *i* ∈ Ord. First, put *a*<sup>0</sup> = ⊥*A*, the least subobject of *A*. Given *a<sup>i</sup>* : *A<sup>i</sup>* - *<sup>A</sup>*, put *<sup>a</sup>i*+1 <sup>=</sup> *a<sup>i</sup>* : *<sup>A</sup>i*+1 <sup>=</sup> *A<sup>i</sup>* - *A*. For every limit ordinal *j*, put *a<sup>j</sup>* = *i<j ai*. Since Sub(*A*) is a set, there exists an ordinal *i* such that *a<sup>i</sup>* = *a*<sup>∗</sup> : *A*<sup>∗</sup> -*A*.

**Remark 4.21.** Note that, whenever monomorphisms are smooth, we have *A*<sup>0</sup> = 0 and the above join *a<sup>j</sup>* is obtained as the colimit of the chain of the subobject *a<sup>i</sup>* : *A<sup>i</sup>* -*A*, *i<j* (see Remark 2.12).

If *F* is a finitary functor on a locally finitely presentable category, then the least ordinal *i* with *a*<sup>∗</sup> = *a<sup>i</sup>* is at most *ω*, but in general one needs transfinite iteration to reach a fixed point.

**Example 4.22.** Let (*A, α*) be a graph regarded as a coalgebra for P (see Example 2.1). Then *A*<sup>0</sup> = ∅, *A*<sup>1</sup> is formed by all leaves; i.e. those nodes with no neighbors, *A*<sup>2</sup> by all leaves and all nodes such that every neighbor is a leaf, etc. We see that a node *x* lies in *Ai*+1 iff every path starting in *x* has length at most *i*. Hence *A*<sup>∗</sup> = *A<sup>ω</sup>* is the set of all nodes from which no infinite paths start.

We close with a general fact on well-founded parts of *fixed points* (i.e. (co)algebras whose structure is invertible). The following result generalizes [15, Cor. 3.4], and it also appeared before for functors preserving finite intersections [4, Theorem 8.16 and Remark 8.18]. Here we lift the latter assumption (see [5, Theorem 7.6] for the new proof):

**Theorem 4.23.** *Let* A *be a complete and well-powered category with smooth monomorphisms. For F preserving monomorphisms, the well-founded part of every fixed point is an initial algebra. In particular, the only well-founded fixed point is the initial algebra.*

**Example 4.24.** We illustrate that for a set functor *F* preserving monomorphisms, the well-founded part of the terminal coalgebra is the initial algebra. Consider *F X* = *A* × *X* + 1. The terminal coalgebra is the set *A*<sup>∞</sup> ∪ *A*<sup>∗</sup> of finite and infinite sequences from the set *A*. The initial algebra is *A*∗. It is easy to check that *A*<sup>∗</sup> is the well-founded part of *A*<sup>∞</sup> ∪ *A*∗.

## **5 The General Recursion Theorem and its Converse**

The main consequence of well-foundedness is parametric recursivity. This is Taylor's General Recursion Theorem [24, Theorem 6.3.13]. Taylor assumed that *F* preserves inverse images. We present a new proof for which it is sufficient that *F* preserves monomorphisms, assuming those are smooth.

**Theorem 5.1 (General Recursion Theorem).** *Let* A *be a complete and wellpowered category with smooth monomorphisms. For F* : A → A *preserving monomorphisms, every well-founded coalgebra is parametrically recursive.*

*Proof sketch.* (1) Let (*A, α*) be well-founded. We first prove that it is recursive. We use the subobjects *a<sup>i</sup>* : *A<sup>i</sup>* -*A* of Construction 4.20<sup>4</sup>, the corresponding

<sup>4</sup> One might object to this use of transfinite recursion, since Theorem 5.1 itself could be used as a justification for transfinite recursion. Let us emphasize that we are not presenting Theorem 5.1 as a foundational contribution. We are building on the classical theory of transfinite recursion.

morphisms *α*(*ai*): *Ai*+1 = *A<sup>i</sup>* → *F A<sup>i</sup>* (cf. Definition 4.3), and the recursive coalgebras (*F<sup>i</sup>* 0*, wi,i*+1) of Example 3.3(6). We obtain a natural transformation *h* from the chain (*Ai*) in Construction 4.20 to the initial-algebra chain (*F<sup>i</sup>* 0) (see Remark 2.13) by transfinite recursion.

Now for every algebra *e*: *F X* → *X*, we obtain a unique coalgebra-to-algebra morphism *f<sup>i</sup>* : *F<sup>i</sup>* 0 → *X*, i.e. we have that *f<sup>i</sup>* = *e* · *F f<sup>i</sup>* · *wi,i*+1. Since (*A, α*) is well-founded, we know that *α* = *α*<sup>∗</sup> = *α*(*ai*) for some *i*. From this it is not difficult to prove that *f<sup>i</sup>* · *h<sup>i</sup>* is a coalgebra-to-algebra morphism from (*A, α*) to (*X, e*).

In order to prove uniqueness, we prove by transfinite induction that for any given coalgebra-to-algebra homomorphism *e*†, one has *e*† · *a<sup>j</sup>* = *f<sup>j</sup>* · *h<sup>j</sup>* · *a<sup>j</sup>* for every ordinal number *j*. Then for the above ordinal number *i* with *a<sup>i</sup>* = *idA*, we have *e*† = *f<sup>i</sup>* · *hi*, as desired. This shows that (*A, α*) is recursive.

(2) We prove that (*A, α*) is parametrically recursive. Consider the coalgebra *α, idA*: *<sup>A</sup>* <sup>→</sup> *F A* <sup>×</sup> *<sup>A</sup>* for *<sup>F</sup>*(−) <sup>×</sup> *<sup>A</sup>*. This functor preserves monomorphisms since *F* does and monomorphisms are closed under products. The next time operator on Sub(*A*) is the same for both coalgebras since the square (4.1) is a pullback if and only if the square on the right below is one.

Since *id<sup>A</sup>* is the unique fixed point of w.r.t. *F* (see Definition 4.3), it is also the unique fixed point of w.r.t. *F*(−) × *A*. Thus, (*A,α, idA*) is a well-founded coalgebra for *F*(−) × *A*. By the previous argument, this coalgebra is thus recursive for

*F*(−) × *A*; equivalently, (*A, α*) is parametrically recursive for *F*.

**Theorem 5.2.** *For every endofunctor on* Set *or* Vec*<sup>K</sup> (vector spaces and linear maps), every well-founded coalgebra is parametrically recursive.*

*Proof sketch.* For Set, we apply Theorem 5.1 to the Trnková hull *F*¯ (see Proposition 2.3), noting that *F* and *F*¯ have the same (non-empty) coalgebras. Moreover, one can show that every well-founded (or recursive) *F*-coalgebra is a well-founded (recursive, resp.) *F*¯-coalgebra. For Vec*K*, observe that monomorphisms split and are therefore preserved by every endofunctor *F*.

**Example 5.3.** We saw in Example 4.11(3) that for *F X* = 1+ *A* × *X* × *X* the coalgebra (*A, s*) from Example 3.3(8) is well-founded, and therefore it is (parametrically) recursive.

**Example 5.4.** Well-founded coalgebras need not be recursive when *F* does not preserve monomorphisms. We take A to be the category of *sets with a predicate*, i.e. pairs (*X, A*), where *A* ⊆ *X*. Morphisms *f* : (*X, A*) → (*Y,B*) satisfy *f*[*A*] ⊆ *B*. Denote by **1** the terminal object (1*,* 1). We define an endofunctor *F* by *F*(*X,* ∅)=(*X* + 1*,* ∅), and for *A* = ∅, *F*(*X, A*) = **1**. For a morphism *<sup>f</sup>* : (*X, A*) <sup>→</sup> (*Y,B*), put *F f* <sup>=</sup> *<sup>f</sup>* <sup>+</sup> *id* if *<sup>A</sup>* <sup>=</sup> <sup>∅</sup>; if *<sup>A</sup>* <sup>=</sup> <sup>∅</sup>, then also *<sup>B</sup>* <sup>=</sup> <sup>∅</sup> and *F f* is *id* : **<sup>1</sup>** <sup>→</sup> **<sup>1</sup>**.

The terminal coalgebra is *id* : **<sup>1</sup>** <sup>→</sup> **<sup>1</sup>**, and it is easy to see that it is wellfounded. But it is not recursive: there are no coalgebra-to-algebra morphisms into an algebra of the form *F*(*X,* ∅) → (*X,* ∅).

We next prove a converse to Theorem 5.1: "recursive =⇒ well-founded". Related results appear in Taylor [23, 24], Adámek et al. [3] and Jeannin et al. [15].

Recall universally smooth monomorphisms from Definition 2.8(2). A *pre-fixed point* of *F* is a monic algebra *α*: *F A* -*A*.

**Theorem 5.5.** *Let* A *be a complete and wellpowered category with universally smooth monomorphisms, and suppose that F* : A → A *preserves inverse images and has a pre-fixed point. Then every recursive coalgebra is well-founded.*

*Proof.* (1) We first observe that an initial algebra exists. This follows from results by Trnková et al. [25] as we now briefly recall. Recall the initial-algebra chain from Remark 2.13. Let *β* : *F B* - *B* be a pre-fixed point. Then there is a unique cocone *β<sup>i</sup>* : *F<sup>i</sup>* 0 → *B* satisfying *βi*+1 = *β* ·*F βi*. Moreover, each *β<sup>i</sup>* is monomorphic. Since *B* has only a set of subobjects, there is some *λ* such that for every *i>λ*, all of the morphisms *β<sup>i</sup>* represent the same subobject of *B*. Consequently, *wλ,λ*+1 of Remark 2.13 is an isomorphism, due to *<sup>β</sup><sup>λ</sup>* <sup>=</sup> *<sup>β</sup>λ*+1 · *<sup>w</sup>λ,λ*+1. Then *μF* <sup>=</sup> *<sup>F</sup> <sup>λ</sup>*<sup>0</sup> with the structure *ι* = *w*−<sup>1</sup> *λ,λ*+1 : *F*(*μF*) → *μF* is an initial algebra.

(2) Now suppose that (*A, α*) is a recursive coalgebra. Then there exists a unique coalgebra homomorphism *<sup>h</sup>*: (*A, α*) <sup>→</sup> (*μF, ι*−1). Let us abbreviate *<sup>w</sup>iλ* by *c<sup>i</sup>* : *F<sup>i</sup>* 0 *μF*, and recall the subobjects *a<sup>i</sup>* : *A<sup>i</sup>* - *A* from Construction 4.20. We will prove by transfinite induction that *a<sup>i</sup>* is the inverse image of *c<sup>i</sup>* under *h*; in symbols: *<sup>a</sup><sup>i</sup>* <sup>=</sup> ←−*<sup>h</sup>* (*ci*) for all ordinals *<sup>i</sup>*. Then it follows that *<sup>a</sup><sup>λ</sup>* is an isomorphism, since so is *cλ*, whence (*A, α*) is well-founded.

In the base case *i* = 0 this is clear since *A*<sup>0</sup> = *W*<sup>0</sup> = 0 is a strict initial object.

For the isolated step we compute the pullback of *ci*+1 : *Wi*+1 → *μF* along *h* using the following diagram:

By the induction hypothesis and since *F* preserves inverse images, the middle square above is a pullback. Since the structure map *ι* of the initial algebra is an isomorphism, it follows that the middle square pasted with the right-hand triangle is also a pullback. Finally, the left-hand square is a pullback by the definition of *ai*+1. Thus, the outside of the above diagram is a pullback, as required.

For a limit ordinal *j*, we know that *a<sup>j</sup>* = *i<j a<sup>i</sup>* and similarly, *c<sup>j</sup>* = *i<j c<sup>i</sup>* since *W<sup>j</sup>* = colim*i<j W<sup>j</sup>* and monomorphisms are smooth (see Remark 2.12(2)). Using Remark <sup>2</sup>.12(3) and the induction hypothesis we thus obtain ←−*<sup>h</sup>* (*c<sup>j</sup>* ) = ←−*<sup>h</sup> i<j c<sup>i</sup>* = *i<j* ←−*<sup>h</sup>* (*ci*) = *i<j a<sup>i</sup>* = *a<sup>j</sup> .*

**Corollary 5.6.** *Let* A *and F satisfy the assumptions of Theorem 5.5. Then the following properties of a coalgebra are equivalent:*


*Proof sketch.* We already know (1) ⇒ (2) ⇒ (3). Since *F* has an initial algebra (as proved in Theorem 5.5), the implication (3) ⇒ (4) follows from Example 3.3(2). In Theorem 5.5 we also proved (4) ⇒ (1). The implication (4) ⇒ (5) follows from Example 4.5(2). Finally, it follows from [6, Remark 2.40] that (*μF, ι*−1) is a terminal well-founded coalgebra, whence (5) ⇒ (4).

**Example 5.7.** (1) The category of many-sorted sets satisfies the assumptions of Theorem 5.5, and polynomial endofunctors on that category preserve inverse images. Thus, we obtain Jeannin et al.'s result [15, Thm. 3.3] that (1)–(4) in Corollary 5.6 are equivalent as a special instance.

(2) The implication (4) ⇒ (3) in Corollary 5.6 does not hold for vector spaces. In fact, for the identity functor on Vec*<sup>K</sup>* we have *μId* = (0*, id*). Hence, every coalgebra has a homomorphism into *μId*. However, not every coalgebra is recursive, e.g. the coalgebra (*K, id*) admits many coalgebra-to-algebra morphisms to the algebra (*K, id*). Similarly, the implication (4) <sup>⇒</sup> (1) does not hold.

We also wish to mention a result due to Taylor [23, Rem. 3.8]. It uses the concept of a *subobject classifier* originating in [17] and prominent in topos theory. This is an object *Ω* with a subobject *t*: 1 - *Ω* such that for every subobject *b* : *B* - *A* definition, every elementary topos has a subobject classifier, in particular every category Set<sup>C</sup> with C small. there is a unique <sup>ˆ</sup>*<sup>b</sup>* : *<sup>A</sup>* <sup>→</sup> *<sup>Ω</sup>* such that *<sup>b</sup>* is the inverse image of *<sup>t</sup>* under <sup>ˆ</sup>*b*. By

Our standing assumption that A is a complete and well-powered category is not needed for the next result: finite limits are sufficient.

**Theorem 5.8 (Taylor [23]).** *Let F be an endofunctor preserving inverse images on a finitely complete category with a subobject classifier. Then every recursive coalgebra is well-founded.*

**Corollary 5.9.** *For every set functor preserving inverse images, the following properties of a coalgebra are equivalent:*

*well-foundedness* ⇐⇒ *parametric recursiveness* ⇐⇒ *recursiveness.*

**Example 5.10.** The hypothesis in Theorems 5.5 and 5.8 that the functor preserves inverse images cannot be lifted. In order to see this, we consider the functor *R*: Set → Set of Example 2.2(4). It preserves monomorphisms but not inverse images. The coalgebra *A* = {0*,* 1} with the structure *α* constant to (0*,* 1) is recursive: given an algebra *β* : *RB* → *B*, the unique coalgebra-to-algebra

homomorphism *h*: {0*,* 1} → *B* is given by *h*(0) = *h*(1) = *β*(*d*)*.* But *A* is not well-founded: ∅ is a cartesian subcoalgebra.

Recall that an initial algebra (*μF, ι*) is also considered as a coalgebra (*μF, ι*−<sup>1</sup>). Taylor [23, Cor. 9.9] showed that, for functors preserving inverse images, the terminal well-founded coalgebra is the initial algebra. Surprisingly, this result is true for *all* set functors.

**Theorem 5.11 [6, Thm. 2.46].** *For every set functor, a terminal well-founded coalgebra is precisely an initial algebra.*

**Theorem 5.12.** *For every functor on* Vec*<sup>K</sup> preserving inverse images, the following properties of a coalgebra are equivalent:*

*well-foundedness* ⇐⇒ *parametric recursiveness* ⇐⇒ *recursiveness.*

## **6 Closure Properties of Well-founded Coalgebras**

In this section we will see that strong quotients and subcoalgebras (see Remark 2.7) of well-founded coalgebras are well-founded again. We mention the following corollary to Proposition 4.19. For endofunctors on sets preserving inverse images this was stated by Taylor [24, Exercise VI.16]:

**Proposition 6.1.** *The subcategory of* Coalg *F formed by all well-founded coalgebras is closed under strong quotients and coproducts in* Coalg *F.*

This follows from a general result on coreflective subcategories [2, Thm. 16.8]: the category Coalg *F* has the factorization system of Proposition 2.6, and its full subcategory of well-founded coalgebras is coreflective with monomorphic coreflections (see Proposition 4.19). Consequently, it is closed under strong quotients and colimits.

We prove next that, for an endofunctor preserving finite intersections, wellfounded coalgebras are closed under subcoalgebras provided that the complete lattice Sub(*A*) is a *frame*. This means that for every subobject *m*: *B* - *A* and every family *m<sup>i</sup>* (*i* ∈ *I*) of subobjects of *A* we have *m*∧ *<sup>i</sup>*∈*<sup>I</sup> <sup>m</sup><sup>i</sup>* <sup>=</sup> *<sup>i</sup>*∈*<sup>I</sup>* (*m*∧*mi*)*.* Equivalently, ←−*<sup>m</sup>* : Sub(*A*) <sup>→</sup> Sub(*B*) (see Notation 2.10) has a right adjoint *m*<sup>∗</sup> : Sub(*B*) → Sub(*A*).

This property holds for Set as well as for the categories of posets, graphs, topological spaces, and presheaf categories Set<sup>C</sup> , C small. Moreover, it holds for every Grothendieck topos. The categories of complete partial orders and Vec*<sup>K</sup>* do not satisfy this requirement.

**Proposition 6.2.** *Suppose that F preserves finite intersections, and let* (*A, α*) *be a well-founded coalgebra such that* Sub(*A*) *a frame. Then every subcoalgebra of* (*A, α*) *is well-founded.*

*Proof.* Let *m*: (*B, β*) - (*A, α*) be a subcoalgebra. We will show that the only pre-fixed point of *<sup>β</sup>* is *id<sup>B</sup>* (cf. Remark <sup>4</sup>.4(2)). Suppose *<sup>s</sup>*: *<sup>S</sup>* - *B* fulfils *β*(*s*) <sup>≤</sup> *<sup>s</sup>*. Since *<sup>F</sup>* preserves finite intersections, we have ←−*<sup>m</sup>* · *<sup>α</sup>* <sup>=</sup> *<sup>β</sup>* · ←−*<sup>m</sup>* by Corollary <sup>4</sup>.15(1). The counit of the above adjunction ←−*<sup>m</sup> <sup>m</sup>*<sup>∗</sup> yields ←−*m*(*m*∗(*s*)) <sup>≤</sup> *<sup>s</sup>*, so that we obtain ←−*m*(*α*(*m*∗(*s*))) = *β*( ←−*m*(*m*∗(*s*))) ≤ *β*(*s*) <sup>≤</sup> *s.* Using again the adjunction ←−*<sup>m</sup> <sup>m</sup>*∗, we have equivalently that *α*(*m*∗(*s*)) <sup>≤</sup> *<sup>m</sup>*∗(*s*); i.e. *<sup>m</sup>*∗(*s*) is a pre-fixed point of *α*. Since (*A, α*) is well-founded, Corollary 4.15(1) implies that *<sup>m</sup>*∗(*s*) = *idA*. Since ←−*<sup>m</sup>* is also a right adjoint and therefore preserves the top element of Sub(*B*), we thus obtain *id<sup>B</sup>* <sup>=</sup> ←−*m*(*idA*) = ←−*m*(*m*∗(*s*)) <sup>≤</sup> *<sup>s</sup>*.

**Remark 6.3.** Given a set functor *F* preserving inverse images, a much better result was proved by Taylor [24, Corollary 6.3.6]: for every coalgebra homomorphism *f* : (*B, β*) → (*A, α*) with (*A, α*) well-founded so is (*B, β*). In fact, our proof above is essentially Taylor's.

**Corollary 6.4.** *If a set functor preserves finite intersections, then subcoalgebras of well-founded coalgebras are well-founded.*

Trnková [26] proved that every set functor preserves all *nonempty* finite intersections. However, this does not suffice for Corollary 6.4:

**Example 6.5.** A well-founded coalgebra for a set functor can have non-wellfounded subcoalgebras. Let *F*∅ = 1 and *F X* = 1+1 for all nonempty sets *X*, and let *F f* = inl: 1 → 1+1 be the left-hand injection for all maps *f* : ∅ → *X* with *X* nonempty. The coalgebra inr: 1 → *F*1 is not well-founded because its empty subcoalgebra is cartesian. However, this is a subcoalgebra of *id* : 1+1 <sup>→</sup> 1+1 (via the embedding inr), and the latter is well-founded.

The fact that subcoalgebras of a well-founded coalgebra are well-founded does not necessarily need the assumption that Sub(*A*) is a frame. Instead, one may assume that the class of morphisms is universally smooth:

**Theorem 6.6.** *If* A *has universally smooth monomorphisms and F preserves finite intersections, every subcoalgebra of a well-founded coalgebra is well-founded.*

## **7 Conclusions**

Well-founded coalgebras introduced by Taylor [24] have a compact definition based on an extension of Jacobs' 'next time' operator. Our main contribution is a new proof of Taylor's General Recursion Theorem that every well-founded coalgebra is recursive, generalizing this result to all endofunctors preserving monomorphisms on a complete and well-powered category with smooth monomorphisms. For functors preserving inverse images, we also have seen two variants of the converse implication "recursive ⇒ well-founded", under additional hypothesis: one due to Taylor for categories with a subobject classifier, and the second one provided that the category has universally smooth monomorphisms and the functor has a pre-fixed point. Various counterexamples demonstrate that all our hypotheses are necessary.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Timed Negotiations***-*

S. Akshay1(-), Blaise Genest<sup>2</sup>, Lo¨ıc H´elou¨et3, and Sharvik Mital<sup>1</sup>

<sup>1</sup> IIT Bombay, Mumbai, India {akshayss,sharky}@cse.iitb.ac.in <sup>2</sup> Univ Rennes, CNRS, IRISA, Rennes, France blaise.genest@irisa.fr <sup>3</sup> Univ Rennes, Inria, Rennes, France loic.helouet@inria.fr

**Abstract.** Negotiations were introduced in [6] as a model for concurrent systems with multiparty decisions. What is very appealing with negotiations is that it is one of the very few non-trivial concurrent models where several interesting problems, such as soundness, i.e. absence of deadlocks, can be solved in PTIME [3]. In this paper, we introduce the model of timed negotiations and consider the problem of computing the minimum and the maximum execution times of a negotiation. The latter can be solved using the algorithm of [10] computing costs in negotiations, but surprisingly minimum execution time cannot.

This paper proposes new algorithms to compute both minimum and maximum execution time, that work in much more general classes of negotiations than [10], that only considered sound and deterministic negotiations. Further, we uncover the precise complexities of these questions, ranging from PTIME to Δ<sup>P</sup> <sup>2</sup> -complete. In particular, we show that computing the minimum execution time is more complex than computing the maximum execution time in most classes of negotiations we consider.

## **1 Introduction**

Distributed systems are notoriously difficult to analyze, mainly due to the explosion of the number of configurations that have to be considered to answer even simple questions. A challenging task is then to propose models on which analysis can be performed with tractable complexities, preferably within polynomial time. Free choice Petri nets are a classical model of distributed systems that allow for efficient verification, in particular when the nets are 1-safe [4, 5].

Recently, [6] introduced a new model called *negotiations* for workflows and business processes. A negotiation describes how processes interact in a distributed system: a subset of processes in a node of the system take a synchronous decisions among several *outcomes*. The effect of this outcome sends contributing processes to a new set of nodes. The execution of a negotiation ends when processes reach a *final configuration*. Negotiations can be deterministic (once an outcome is fixed, each process knows its unique successor node) or not.

Negotiations are an interesting model since several properties can be decided with a reasonable complexity. The question of *soundness*, i.e., deadlock-freedom:

<sup>-</sup> Supported by DST/CEFIPRA/INRIA Associated team EQuaVE and DST/SERB Matrices grant MTR/2018/000744.

c The Author(s) 2020

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 37–56, 2020. https://doi.org/10.1007/978-3-030-45231-5\_3

whether from every reachable configuration one can reach a final configuration, is PSPACE-complete. However, for deterministic negotiations, it can be decided in PTIME [7]. The decision procedure uses reduction rules. Reduction techniques were originally proposed for Petri nets [2, 8, 11, 16]. The main idea is to define transformations rules that produce a model of smaller size w.r.t. the original model, while preserving the property under analysis. In the context of negotiations, [7, 3] proposed a sound and complete set of soundness-preserving reduction rules and algorithms to apply these rules efficiently. The question of soundness for deterministic negotiations was revisited in [9] and showed NLOGSPACEcomplete using anti patterns instead of reduction rules. Further, they show that the PTIME result holds even when relaxing determinism [9]. Negotiation games have also been considered to decide whether one particular process can force termination of a negotiation. While this question is EXPTIME-complete in general, for sound and deterministic negotiations, it becomes PTIME [12].

While it is natural to consider cost or time in negotiations (e.g. think of the Brexit negotiation where time is of the essence, and which we model as running example in this paper), the original model of negotiations proposed by [6] is only qualitative. Recently, [10] has proposed a framework to associate costs to the executions of negotiations, and adapt a static analysis technique based on reduction rules to compute end-to-end cost functions that are not sensitive to scheduling of concurrent nodes. For sound *and* deterministic negotiations, the end-to-end cost can be computed in O(n.(C + n)), where n is the size of the negotiation and C the time needed to compute the cost of an execution. Requiring soundness or determinism seems perfectly reasonable, but asking sound *and* deterministic negotiations is too restrictive: it prevents a process from waiting for decisions of other processes to know how to proceed.

In this paper, we revisit time in negotiations. We attach time intervals to outcomes of nodes. We want to compute maximal and minimal executions times, for negotiations that are not necessarily sound and deterministic. Since we are interested in minimal and maximal execution time, cycles in negotiations can be either bypassed or lead to infinite maximal time. Hence, we restrict this study to acyclic negotiations. Notice that time can be modeled as a cost, following [10], and the maximal execution time of a sound and deterministic negotiation can be computed in PTIME using the algorithm from [10]. Surprisingly however, we give an example (Example 3) for which the minimal execution time cannot be computed in PTIME by this algorithm.

The first contribution of the paper shows that reachability (whether at least one run of a negotiation terminates) is NP-complete, already for (untimed) deterministic acyclic negotiations. This implies that computing minimal or maximal execution time for deterministic (but unsound) acyclic negotiations cannot be done in PTIME (unless NP=PTIME). We characterize precisely the complexities of different decision variants (threshold, equality, etc.), with complexities ranging from (co-)NP-complete to Δ<sup>P</sup> 2 .

We thus turn to negotiations that are sound but not necessarily deterministic. Our second contribution is a new algorithm, not based on reduction rules, to compute the maximal execution time in PTIME for sound negotiations. It is based on computing the maximal execution time of critical paths in the negotiations. However, we show that *minimal* execution time cannot be computed in PTIME for sound negotiations (unless NP=PTIME): deciding whether the minimal execution time is lower than T is NP-complete, even for T given in unary, using a reduction from a Bin packing problem. This shows that minimal execution time is harder to compute than maximal execution time.

Our third contribution consists in defining a class in which the minimal execution time can be computed in (pseudo) PTIME. To do so, we define the class of k-layered negotiations, for k fixed, that is negotiations where nodes can be organized into layers of at most k nodes at the same depth. These negotiations can be executed without remembering more than k nodes at a time. In this case, we show that computing the maximal execution time is PTIME, even if the negotiation is neither deterministic nor sound. The algorithm, not based on reduction rules, uses the k-layer restriction in order to navigate in the negotiation while considering only a polynomial number of configurations. For minimal execution time, we provide a pseudo PTIME algorithm, that is PTIME if constants are given in unary. Finally, we show that the size of constants do matter: deciding whether the minimal execution time of a k-layered negotiation is less than T is NP-complete, when T is given in binary. We show this by reducing from a Knapsack problem, yet again emphasizing that the minimal execution time of a negotiation is harder to compute than its maximal execution time.

This paper is organized as follows. Section 2 introduces the key ingredients of negotiations, determinism and soundness, known results in the untimed setting, and provides our running example modeling the Brexit negotiation. Section 3 introduces time in negotiations, gives a semantics to this new model, and formalizes several decision problems on maximal and minimal durations of runs in timed negotiations. We recall the main results of the paper in Section 4. Then, Section 5 considers timed execution problems for deterministic negotiations, Section 6 for sound negotiations, and section 7 for layered negotiations. Proof details for the last three sections are given in an extended version of this paper [1].

## **2 Negotiations: Definitions and Brexit example**

In this section, we recall the definition of negotiations, of some subclasses (acyclic and deterministic), as well as important problems (soundness and reachability).

**Definition 1 (Negotiation [6, 10]).** *A* negotiation *over a finite set of processes* <sup>P</sup> *is a tuple* <sup>N</sup> = (N,n0, n<sup>f</sup> , <sup>X</sup> )*, where:*


**Fig. 1.** A (sound but non-deterministic) negotiation modeling Brexit.

**–** *For all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*,* <sup>X</sup><sup>n</sup> : <sup>P</sup><sup>n</sup> <sup>×</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>2</sup><sup>N</sup> *is a map defining the transition relation from node* <sup>n</sup>*, with* <sup>X</sup>n(p, r) = <sup>∅</sup> *iff* <sup>n</sup> <sup>=</sup> <sup>n</sup><sup>f</sup> , r <sup>=</sup> <sup>r</sup><sup>f</sup> *. We denote* <sup>X</sup> : <sup>N</sup> <sup>×</sup> <sup>P</sup> <sup>×</sup> <sup>R</sup> <sup>→</sup> <sup>2</sup><sup>N</sup> *the partial map defined on* - <sup>n</sup>∈<sup>N</sup> ({n}×P<sup>n</sup> <sup>×</sup>Rn)*, with* <sup>X</sup> (n, p, a) = <sup>X</sup>n(p, a) *for all* p, a*.*

Intuitively, at a node n = (Pn, Rn) in a negotiation, all processes of P<sup>n</sup> have to agree on a common outcome r chosen from Rn. Once this outcome r is chosen, every process <sup>p</sup> <sup>∈</sup> <sup>P</sup><sup>n</sup> is ready to move to any node prescribed by <sup>X</sup> (n, p, r). A new node m can only start when all processes of P<sup>m</sup> are ready to move to m.

*Example 1.* We illustrate negotiations by considering a simplified model of the Brexit negotiation, see Figure 1. There are 3 processes, <sup>P</sup> <sup>=</sup> {EU, PM, P a}. At first EU decides whether or not to enforce a backstop in any deal (outcome backstop) or not (outcome no-backstop). In the meantime, PM decides to proroge P a, and P a can choose or not to appeal to court (outcome court/no court). If it goes to court, then PM and P a will take some time in court (c-meet, defend), before PM can meet EU to agree on a deal. Otherwise, P a goes to recess, and PM can meet EU directly. Once EU and PM agreed on a deal, PM tries to convince P a to vote the deal. The final outcome is whether the deal is voted, or whether Brexit is delayed.

**Definition 2 (Deterministic negotiations).** *A process* <sup>p</sup> <sup>∈</sup> <sup>P</sup> *is* deterministic *iff, for every* <sup>n</sup> <sup>∈</sup> <sup>N</sup> *and every outcome* <sup>r</sup> *of* <sup>n</sup>*,* <sup>X</sup> (n, p, r) *is a singleton. A negotiation is* deterministic *iff all its processes are deterministic. It is* weakly nondeterministic *[9] (called weakly deterministic in [3]) iff, for every node* n*, one of the processes in* P<sup>n</sup> *is deterministic. Last, it is* very weakly non-deterministic [9] *(called weakly deterministic in [6]) iff, for every* <sup>n</sup>*, every* <sup>p</sup> <sup>∈</sup> <sup>P</sup><sup>n</sup> *and every outcome* <sup>r</sup> *of* <sup>n</sup>*, there exists a deterministic process* <sup>q</sup> *such that* <sup>q</sup> <sup>∈</sup> <sup>P</sup><sup>n</sup> *for every* <sup>n</sup> ∈ X (n, p, r)*.*

In deterministic negotiations, once an outcome is chosen, each process knows the next node it will be involved in. In (very-)weakly non-deterministic negotiations, the next node might depend upon the outcome chosen in other nodes by other processes. However, once the outcomes have been chosen for all current nodes, there is only one next node possible for each process. Observe that the class of deterministic negotiations is isomorphic to the class of free choice workflow nets [10]. In Example 1, the Brexit negotiation is non-deterministic, because process PM is non-deterministic. Indeed, consider outcomes *c-meet*: it allows two nodes, according to whether the backstop is enforced or not, which is a decision taken by process EU.

**Semantics:** <sup>A</sup> *configuration* [3] of a negotiation is a mapping <sup>M</sup> : <sup>P</sup> <sup>→</sup> <sup>2</sup><sup>N</sup> . Intuitively, it tells for each process p the set M(p) of nodes p is ready to engage in. The semantics of a negotiation is defined in terms of moves from a configuration to the next one. The *initial* M<sup>0</sup> and *final* M<sup>f</sup> configurations, are given by M0(p) = {n0} and <sup>M</sup><sup>f</sup> (p) = <sup>∅</sup> respectively for every process <sup>p</sup> <sup>∈</sup> <sup>P</sup>. A configuration <sup>M</sup> *enables* node <sup>n</sup> if <sup>n</sup> <sup>∈</sup> <sup>M</sup>(p) for every <sup>p</sup> <sup>∈</sup> <sup>P</sup>n. When <sup>n</sup> is enabled, a decision at node <sup>n</sup> can occur, and the participants at this node choose an outcome <sup>r</sup> <sup>∈</sup> Rn. The occurrence of (n, r) produces the configuration M given by M (p) = <sup>X</sup> (n, p, r) for every <sup>p</sup> <sup>∈</sup> <sup>P</sup><sup>n</sup> and <sup>M</sup> (p) = <sup>M</sup>(p) for remaining processes in <sup>P</sup> \Pn. Moving from M to M after choosing (n, r) is called a *step*, denoted M n,r −−→ <sup>M</sup> . A *run* of <sup>N</sup> is a sequence (n1, r1),(n2, r2)...(nk, rk) such that there is a sequence of configurations <sup>M</sup>0, M1,...,M<sup>k</sup> and every (ni, ri) is a step between <sup>M</sup><sup>i</sup>−<sup>1</sup> and <sup>M</sup>i. A run starting from the initial configuration and ending in the final configuration is called a *final run*. By definition, its last step is (n<sup>f</sup> , r<sup>f</sup> ).

An important class of negotiations in the context of timed negotiations is acyclic negotiations, where infinite sequence of steps is impossible:

**Definition 3 (Acyclic negotiations).** *The* graph *of a negotiation* N *is the labeled graph* <sup>G</sup><sup>N</sup> = (V,E) *where* <sup>V</sup> <sup>=</sup> <sup>N</sup>*, and* <sup>E</sup> <sup>=</sup> {((n,(p, r), n ) <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>X</sup> (n, p, r)}*, with pairs of the form* (p, r) *being the labels. A negotiation is* acyclic *iff its graph is acyclic. We denote by* P aths(G<sup>N</sup> ) *the set of paths in the graph of a negotiation. These paths are of form* <sup>π</sup> = (n0,(p0, r0), n1)...(n<sup>k</sup>−<sup>1</sup>,(pk, rk), nk)*.*

The Brexit negotiation of Fig.1 is an example of acyclic negotiation. Despite their apparent simplicity, negotiations may express involved behaviors as shown with the Brexit example. Indeed two important questions in this setting are whether there is some way to reach a final node in the negotiation from (i) the initial node and (ii) any reachable node in the negotiation.

#### **Definition 4 (Soundness and Reachability).**


Notice that the Brexit negotiation of Fig.1 is sound (but not deterministic). It seems hard to preserve the important features of this negotiation while being both sound *and* deterministic. The problem of soundness has received considerable attention. We summarize the results about soudness in the next theorem:

**Theorem 1.** *Determining whether a negotiation is sound is PSPACE-Complete. For (very-)weakly non-deterministic negotiations, it is co-NP-complete [9]. For acyclic negotiations, it is in DP and co-NP-Hard [6]. Determining whether an acyclic weakly non-deterministic negotiation is sound is in PTIME [3, 9]. Finally, deciding soundness for deterministic negotiations is NLOGSPACE-complete [9].*

Checking reachability is NP-complete, even for deterministic acyclic negotiations (surprisingly, we did not find this result stated before in the literature):

**Proposition 1.** *Reachability is NP-complete for acyclic negotiations, even if the negotiation is deterministic.*

*Proof (sketch).* One can guess a run of size ≤ |N | in polynomial time, and verify if it reaches n<sup>f</sup> , which gives the inclusion in NP. The hardness part comes from a reduction from 3-CNF-SAT that can be found in the proof of Theorem 3.

## *k***-Layered Acyclic Negotiations**

We introduce a new class of negotiations which has good algorithmic properties, namely k-layered acyclic negotiations, for k fixed. Roughly speaking, nodes of a k-layered acyclic negotiations can be arranged in layers, and these layers contain at most k nodes. Before giving a formal definition, we need to define the depth of nodes in N .

First, a *path* in a negotiation is a sequence of nodes n<sup>0</sup> ...n such that for all <sup>i</sup> ∈ {1,..., <sup>−</sup> <sup>1</sup>}, there exists <sup>p</sup>i, r<sup>i</sup> with <sup>n</sup>i+1 ∈ X (ni, pi, ri). The *length* of a path n0,...,n is . The *depth* depth(n) of a node n is the maximal length of a path from <sup>n</sup><sup>0</sup> to <sup>n</sup> (recall that <sup>N</sup> is acyclic, so this number is always finite).

**Definition 5.** *An acyclic negotiation is* layered *if for all node* n*, every path reaching* n *has length depth*(n)*. An acyclic negotiation is* k-layered *if it is layered, and for all* <sup>∈</sup> <sup>N</sup>*, there are at most* <sup>k</sup> *nodes at depth .*

The Brexit example of Fig. 1 is 6-layered. Notice that a layered negotiation is necessarily <sup>k</sup>-layered for some <sup>k</sup> ≤ |N | − 2. Note also that we can always transform an acyclic negotiation N into a layered acyclic negotiation N , by adding dummy nodes: for every node <sup>m</sup> ∈ X (n, p, r) with depth(m) <sup>&</sup>gt; depth(n)+ 1, we can add several nodes n1,...n with = depth(m) <sup>−</sup> (depth(n) + 1), and processes <sup>P</sup><sup>n</sup><sup>i</sup> <sup>=</sup> {p}. We compute a new relation <sup>X</sup> such that <sup>X</sup> (n, p, r) = {n1}, <sup>X</sup> (n-, p, r) = {m} and for every <sup>i</sup> <sup>∈</sup> <sup>1</sup>.. <sup>−</sup> 1, <sup>X</sup> (ni, p, r) = <sup>n</sup>i+1. This transformation is polynomial: the resulting negotiation is of size up to |N | × |X | × |P|. The proof of the following Theorem can be found in [1].

**Theorem 2.** *Let* <sup>k</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup>*. Checking reachability or soundness for a* <sup>k</sup>*-layered acyclic negotiation* N *can be done in PTIME.*

## **3 Timed Negotiations**

In many negotiations, time is an important feature to take into account. For instance, in the Brexit example, with an initial node starting at the begining of September 2019, there are 9 weeks to pass a deal till the 31st October deadline.

We extend negotiations by introducing timing constraints on outcomes of nodes, inspired by timed Petri nets [14] and by the notion of negotiations with costs [10]. We use time intervals to specify lower and upper bounds for the duration of negotiations. More precisely, we attach time intervals to pairs (n, r) where n is a node and r an outcome. In the rest of the paper, we denote by I the set of intervals with endpoints that are non-negative integers or ∞. For convenience we only use closed intervals in this paper (except for ∞), but the results we show can also be extended to open intervals with some notational overhead. Intuitively, outcome r can be taken at a node n with associated time interval [a, b] only after a time units have elapsed from the time all processes contributing to n are ready to engage in n, and at most b time units later.

**Definition 6.** *<sup>A</sup>* timed negotiation *is a pair* (<sup>N</sup> , γ) *where* <sup>N</sup> *is a negotiation, and* <sup>γ</sup> : <sup>N</sup> <sup>×</sup><sup>R</sup> → I *associates an interval to each pair* (n, r) *of node and outcome such that* <sup>r</sup> <sup>∈</sup> <sup>R</sup>n*. For a given node* <sup>n</sup> *and outcome* <sup>r</sup>*, we denote by* <sup>γ</sup>−(n, r) *(resp.* γ<sup>+</sup>(n, r)*) the lower bound (resp. the upper bound) of* γ(n, r)*.*

*Example 2.* In the Brexit example, we define the following timed constraints γ. We only specify the outcome names, as the timing only depends upon them. Backstop and no-backstop both take between 1 and 2 weeks: γ(backstop) = γ(no-backstop) = [1, 2]. In case of no-court, recess takes 5 weeks γ(recess) = [5, 5], and PM can meet EU immediatly γ(meet) = [0, 0]. In case of court action, PM needs to spend 2 weeks in court γ(c-meet) = [2, 2], and depending on the court delay and decision, Pa needs between 3 (court overules recess) to 5 (court confirms recess) weeks, γ(defend) = [3, 5]. Agreeing on a deal can take anywhere from 2 weeks to 2 years (104 weeks): γ(deal agreed) = [2, 104]—some would say infinite time is even possible! It needs more time with the backstop, γ(deal w/backstop) = [5, 104]. All other outcomes are assumed to be immediate, i.e., associated with [0, 0].

**Semantics:** <sup>A</sup> *timed valuation* is a map <sup>μ</sup> : <sup>P</sup> <sup>→</sup> <sup>R</sup>≥<sup>0</sup> that associates a nonnegative real value to every process. A *timed configuration* is a pair (M,μ) where M is a configuration and μ a timed valuation. There is a *timed step* from (M,μ) to (M , μ ), denoted (M,μ) (n,r) −−−→ (M , μ ), if (i) <sup>M</sup> (n,r) −−−→ <sup>M</sup> , (ii) p /<sup>∈</sup> <sup>P</sup><sup>n</sup> implies μ (p) = <sup>μ</sup>(p) (iii) <sup>∃</sup><sup>d</sup> <sup>∈</sup> <sup>γ</sup>(n, r) such that <sup>∀</sup><sup>p</sup> <sup>∈</sup> <sup>P</sup>n, we have <sup>μ</sup> (p) = max<sup>p</sup>-<sup>∈</sup>P<sup>n</sup> <sup>μ</sup>(p ) + d (d is the duration of node n).

Intuitively a timed step (M,μ) (n,r) −−−→ (M , μ ) depicts a decision taken at node n, and how long each process of P<sup>n</sup> waited in that node before taking decision (n, r). The last process engaged in n must wait for a duration contained in γ(n, r). However, other processes may spend a time greater than γ<sup>+</sup>(n, r).

<sup>A</sup> *timed run* is a sequence of steps <sup>ρ</sup> = (M0, μ0) <sup>e</sup><sup>1</sup> −→ (M1, μ1)...(Mk, μk) where <sup>M</sup><sup>0</sup> is the initial configuration, <sup>μ</sup>0(p) = 0 for every <sup>p</sup> <sup>∈</sup> <sup>P</sup>, and each (Mi, μi) <sup>e</sup><sup>i</sup> −→ (Mi+1, μi+1) is a timed step. It is *final* if <sup>M</sup><sup>k</sup> <sup>=</sup> <sup>M</sup><sup>f</sup> . Its *execution time* <sup>δ</sup>(ρ) is defined as <sup>δ</sup>(ρ) = maxp∈<sup>P</sup> <sup>μ</sup>k(p).

Notice that we only attached timing to processes, not to individual steps. With our definition of runs, timing on steps may not be monotonous (i.e., nondecreasing) along the run, while timing on processes is. Viewed by the lens of concurrent systems, the timing is monotonous on the partial orders of the system rather than the linearization. It is not hard to restrict paths, if necessary, to have a monotonous timing on steps as well. In this paper, we are only interested in execution time, which does not depend on the linearization considered.

Given a timed negotiation N , we can now define the minimum and maximum execution time, which correspond to optimistic or pessimistic views:

**Definition 7.** *Let* N *be a timed negotiation. Its* minimum execution time*, denoted* mintime(<sup>N</sup> ) *is the minimal* <sup>δ</sup>(ρ) *over all final timed run* <sup>ρ</sup> *of* <sup>N</sup> *. We define the* maximal execution time maxtime(<sup>N</sup> ) *of* <sup>N</sup> *similarly.*

Given <sup>T</sup> <sup>∈</sup> <sup>N</sup>, the main problems we consider in this paper are the following:


These questions have a practical interest : in the Brexit example, the question "is there a way to have a vote on a deal within 9 weeks ?" is indeed a minimum execution time problem. We also address the equality variant of these decision problems, i.e., mintime(<sup>N</sup> ) = <sup>T</sup> **:** is there a final run of <sup>N</sup> that terminates in exactly T time units and no other final run takes less than T time units? Similarly for maxtime(<sup>N</sup> ) = <sup>T</sup>.

*Example 3.* We use Fig. 1 to show that it is not easy to compute the minimal execution time, and in particular one cannot use the algorithm from [10] to compute it. Consider the node <sup>n</sup> with <sup>P</sup><sup>n</sup> <sup>=</sup> {PM,Pa} and <sup>R</sup><sup>n</sup> <sup>=</sup> {court, no court}. If the outcome is court, then PM needs 2 weeks before (s)he can talk to EU and P a needs at least 3 weeks before he can debate. However, if the outcome is no court, then PM need not wait before (s)he can talk to EU, but P a wastes 5 weeks in recess. This means that one needs to remember different alternatives which could be faster in the end, depending on the future. On the other hand, the algorithm from [10] attaches one minimal time to process P a, and one minimal time to process PM. No matter the choices (0 or 2 for PM and 3 or 5 for P a), there will be futures in which the chosen number will over or underapproximate the real minimal execution time (this choice is not explicit in [10])<sup>4</sup>.

<sup>4</sup> the authors of [10] acknowledged the issue with their algorithm for mintime.

For maximum execution time, it is not an issue to attach to each node a unique maximal execution time. The reason for the asymmetry between minimal and maximal execution times of a negotiation is that the execution time of a path is maxp∈<sup>P</sup> <sup>μ</sup>k(p), for <sup>μ</sup><sup>k</sup> the last timed valuation, which breaks the symmetry between min and max.

## **4 High level view of the main results**

In this section, we give a high-level description of our main results. Formal statements can be found in the sections where they are proved. We gather in Fig. 2 the precise complexities for the minimal and the maximal execution time problems for 3 classes of negotiations that we describe in the following. Since we are interested in minimum and maximum execution time, cycles in negotiations can be either bypassed or lead to infinite maximal time. Hence, while we define timed negotiations in general, we always restrict to acyclic negotiations (such as Brexit) while stating and proving results.

In [10], a PTIME algorithm is given to compute different costs for negotiations that are both sound *and* deterministic. One limitation of this result is that it cannot compute the minimum execution time, as explained in Example 3. A second limitation is that the class of sound and deterministic negotiations is quite restrictive: it cannot model situations where the next node a process participates in depends on the outcome from another process, as in the Brexit example. We thus consider classes where one of these restrictions is dropped.

We first consider (Section 5) negotiations that are deterministic, but without the soundness restriction. We show that for this class, no timed problem we consider can be solved in PTIME (unless NP=PTIME). Further, we show that the equality problems (maxtime/mintime(<sup>N</sup> ) = <sup>T</sup>), are complete for the complexity class DP, i.e., at the second level of the Boolean Hierarchy [15].

We then consider (Section 6) the class of negotiations that are sound, but not necessarily deterministic. We show that maximum execution time can be solved in PTIME, and propose a new algorithm. However, the minimum execution time cannot be computed in PTIME (unless NP=PTIME). Again for the mintime equality problem we have a matching DP-completeness result.


**Fig. 2.** Results for acyclic timed negotiations. DP refers to the complexity class, Difference Polynomial time [15], the second level of the Boolean Hierarchy.

hardness holds even for very weakly non-deterministic negotiations, and T in unary.


Finally, in order to obtain a polytime algorithm to compute the minimum execution time, we consider the class of k-layered negotiations (see Section 7): Given <sup>k</sup> <sup>∈</sup> <sup>N</sup>, we can show that maxtime(<sup>N</sup> ) can be computed in PTIME for <sup>k</sup>-layered negotiations. We also show that while the mintime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup>? problem is weakly NP-complete for <sup>k</sup>-layered negotiations, we can compute mintime(<sup>N</sup> ) in pseudo-PTIME, i.e. in PTIME if constants are given in unary.

## **5 Deterministic Negotiations**

We start by considering the class of deterministic acyclic negotiations. We show that both maximal and minimal execution times cannot be computed in PTIME (unless NP=PTIME), as the threshold problems are (co-)NP-complete.

**Theorem 3.** *The* mintime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup> *decision problem is NP complete, and the* maxtime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup> *decision problem is co-NP-complete for acyclic deterministic timed negotiations.*

*Proof.* For mintime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup>, containment in NP is easy: we just need to guess a run <sup>ρ</sup> (of polynomial size as <sup>N</sup> is acyclic), consider the associated timed run <sup>ρ</sup><sup>−</sup> where all decisions are taken at their earliest possible dates, and check whether <sup>δ</sup>(ρ−) <sup>≤</sup> <sup>T</sup>, which can be done in time O(|N |+log <sup>T</sup>).

For the hardness, we give the proof in two steps. First, we start with a proof of Proposition 1 that reachability problem is NP-hard using reduction of 3-CNF SAT, i.e., given a formula <sup>φ</sup>, we build a deterministic negotiation <sup>N</sup><sup>φ</sup> s.t. <sup>φ</sup> is satisfiable iff N<sup>φ</sup> has a final run. In a second step, we introduce timings on this negotiation and show that mintime(Nφ) <sup>≤</sup> <sup>T</sup> iff <sup>φ</sup> is satisfiable.

*Step 1: Reducing 3-CNF-SAT to Reachability problem.*

Given a Boolean formula <sup>φ</sup> with variables <sup>v</sup>i, <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> and clauses <sup>c</sup><sup>j</sup> , <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>m</sup>, for each variable <sup>v</sup><sup>i</sup> we define the sets of clauses <sup>S</sup>i,<sup>t</sup> <sup>=</sup> {c<sup>j</sup> <sup>|</sup> <sup>v</sup><sup>i</sup> is present in <sup>c</sup>j} and <sup>S</sup>i,<sup>f</sup> <sup>=</sup> {c<sup>j</sup> | ¬v<sup>i</sup> is present in <sup>c</sup>j}. Clauses in <sup>S</sup>i,<sup>t</sup> and <sup>S</sup>i,<sup>f</sup> are naturally ordered: c<sup>i</sup> < c<sup>j</sup> iff i<j. We denote these elements Si,<sup>t</sup>(1) < Si,<sup>t</sup>(2) < .... Similarly for set Si,<sup>f</sup>.

Now, we construct a negotiation N<sup>φ</sup> (as depicted in Figure 3) with a process V<sup>i</sup> for each variable v<sup>i</sup> and a process C<sup>j</sup> for each clause c<sup>j</sup> :


**Fig. 3.** A part of <sup>N</sup><sup>φ</sup> where clause <sup>c</sup><sup>j</sup> is (i<sup>2</sup> ∨ ¬<sup>i</sup> ∨ ¬i3) and clause <sup>c</sup><sup>k</sup> is (i<sup>4</sup> ∨ ¬<sup>i</sup> <sup>∨</sup> <sup>i</sup>5). Timing is [0, 0] whereever not mentioned

**–** Node P air<sup>c</sup>r,v<sup>i</sup> has V<sup>i</sup> and C<sup>r</sup> as its processes and one outcome ctof which takes process C<sup>r</sup> to final node n<sup>f</sup> and process V<sup>i</sup> to Tlone<sup>v</sup>i,j+1 (with c<sup>r</sup> = <sup>S</sup>i,<sup>t</sup>(j)), or to <sup>n</sup><sup>f</sup> if <sup>j</sup> <sup>=</sup> <sup>|</sup>Si,<sup>t</sup>|. Node P air<sup>c</sup>r,¬v<sup>i</sup> is defined in the same way from Flone<sup>v</sup>i,j .

With this we claim that <sup>N</sup><sup>φ</sup> has a final run iff <sup>φ</sup> is satisfiable which completes the first step of the proof. We give a formal proof of this claim in Appendix A of [1]. Observe that the negotiation N<sup>φ</sup> constructed is deterministic and acyclic (but it is not sound).

*Step 2* : Before we introduce timing on <sup>N</sup>φ, we introduce a new outcome <sup>r</sup> at n<sup>0</sup> which takes all processes to n<sup>f</sup> . Now, the timing function γ associated with <sup>N</sup><sup>φ</sup> is: <sup>γ</sup>(n0, r) = [2, 2] and <sup>γ</sup>(n0, r ) = [3, 3] and γ(n, r) = [0, 0], for all node <sup>n</sup> <sup>=</sup> <sup>n</sup><sup>0</sup> and all <sup>r</sup> <sup>∈</sup> <sup>R</sup>n. Then, mintime(Nφ) <sup>≤</sup> 2 iff <sup>φ</sup> has a satisfiable assignment: if mintime(Nφ) <sup>≤</sup> 2, there is a run with decision <sup>r</sup> taken at <sup>n</sup><sup>0</sup> which is final. But existence of any such final run implies satisfiability of φ. For reverse implication, if φ is satisfiable, then the corresponding run for satisfying assignment takes 2 time units, which means that mintime(Nφ) <sup>≤</sup> 2.

Similarly, we can prove that the MaxTime problem is co-NP complete by changing γ(n0, r- ) = [1, 1] and asking if maxtime(Nφ) <sup>&</sup>gt; 1 for the new <sup>N</sup>φ. The answer will be yes iff <sup>φ</sup> is satisfiable.

We now consider the related problem of checking if mintime(<sup>N</sup> ) = <sup>T</sup> (or if maxtime(<sup>N</sup> ) = <sup>T</sup>). These problems are harder than their threshold variant under usual complexity assumptions: they are DP-complete (Difference Polynomial time class, i.e., second level of the Boolean Hierarchy, defined as intersection of a problem in NP and one in co-NP [15]).

**Proposition 2.** *The* mintime(<sup>N</sup> ) = <sup>T</sup> *and* maxtime(<sup>N</sup> ) = <sup>T</sup> *decision problems are DP-complete for acyclic deterministic negotiations.*

*Proof.* We only give the proof for mintime (the proof for maxtime is given in Appendix A of [1]). Indeed, it is easy to see that this problem is in DP, as it can be written as mintime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup> which is in NP and <sup>¬</sup>(mintime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup> <sup>−</sup> 1)), which is in co-NP. To show hardness, we use the negotiation constructed in the above proof as a gadget, and show a reduction from the SAT-UNSAT problem (a standard DP-complete problem).

The SAT-UNSAT Problem asks given two Boolean expressions φ and φ - , both in CNF forms with three literals per clause, is it true that φ is satisfiable and φ - is unsatisfiable? SAT-UNSAT is known to be DP-complete [15]. We reduce this problem to mintime(<sup>N</sup> ) = <sup>T</sup>.

Given φ, φ - , we first make the corresponding negotiations N<sup>φ</sup> and Nφ as in the previous proof. Let <sup>n</sup><sup>0</sup> and <sup>n</sup><sup>f</sup> be the initial and final nodes of <sup>N</sup><sup>φ</sup> and n - <sup>0</sup> and n - <sup>f</sup> be the initial and final nodes of Nφ- . (Similarly, for other nodes we write above the nodes to signify they belong to Nφ-.)

In the negotiation Nφ- , we introduce a new node nall, in which all the processes participate (see Figure 4). The node nall has a single outcome r all which sends all the processes to n<sup>f</sup> . Also, for node n - <sup>0</sup>, apart from the outcome r which sends all processes to different nodes, there is another outcome rall which sends all the processes to nall. Now we merge the nodes n<sup>f</sup> and n - <sup>0</sup> and call the merged node nsep. Also nodes n<sup>0</sup> and n <sup>f</sup> now have all the processes of N<sup>φ</sup> and Nφ- participating in them. This merged process gives us a new negotiation Nφ,φ in which the structure above <sup>n</sup>sep is same as <sup>N</sup><sup>φ</sup> while below it is same as <sup>N</sup>φ- . Node <sup>n</sup>sep now has all the processes of <sup>N</sup><sup>φ</sup> and <sup>N</sup>φ participating in it. The outcomes of nsep will be same as that of n <sup>0</sup> (rall, r). For both the outcomes of <sup>n</sup>sep the processes corresponding to <sup>N</sup><sup>φ</sup> directly go to <sup>n</sup><sup>f</sup> of the <sup>N</sup>φ,φ- . Similarly <sup>n</sup><sup>0</sup> of <sup>N</sup>φ,φ which is same <sup>n</sup><sup>0</sup> of <sup>N</sup>φ, sends processes corresponding to <sup>N</sup>φ directly to <sup>n</sup>sep for all its outcomes. We now define timing function <sup>γ</sup> for <sup>N</sup>φ,φ- which is as follows: γ(Lone- <sup>v</sup><sup>i</sup> , r) = [1, 1] for all <sup>v</sup><sup>i</sup> <sup>∈</sup> <sup>φ</sup> - and <sup>r</sup> ∈ {true, false}, γ(nall, r all) = [2, 2] and <sup>γ</sup>(n, r) = [0, 0] for all other outcomes of nodes. With this construction, one can conclude that mintime(Nφ,φ- ) = 2 iff φ is satisfiable and φ - is unsatisfiable (see [1] for details). This completes the reduction and hence proves DP-hardness.

**Fig. 4.** Structure of <sup>N</sup>φ,φ-

Finally, we consider a related problem of computing the min and max time. To consider the decision variant, we rephrase this problem as checking whether an arbitrary bit of the minimum execution time is 1. Perhaps surprisingly, we obtain that this problem goes even beyond DP, the second level of the Boolean Hierarchy and is in fact hard for Δ<sup>P</sup> <sup>2</sup> (second level of the *polynomial* hierarchy), which contains the entire Boolean Hierarchy. Formally,

**Theorem 4.** *Given an acyclic deterministic timed negotiation and a positive integer* k*,computing the* kth *bit of the maximum/minimum execution time is* Δ<sup>P</sup> <sup>2</sup> *-complete.*

Finally, we remark that if we were interested in the optimization variant and not the decision variant of the problem, the above proof can be adapted to show that these variants are OptP-complete (as defined in [13]). But as optimization is not the focus of this paper, we avoid formal details of this proof.

## **6 Sound Negotiations**

Sound negotiations are negotiations in which every run can be extended to a final run, as in Fig. 1. In this section, we show that maxtime(<sup>N</sup> ) can be computed in PTIME for sound negotiations, hence giving PTIME complexities for the maxtime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup>? and maxtime(<sup>N</sup> ) = <sup>T</sup>? questions. However, we show that mintime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup> is NP-complete for sound negotiations, and that mintime(<sup>N</sup> ) = <sup>T</sup> is DP-complete, even if <sup>T</sup> is given in unary.

Consider the graph <sup>G</sup><sup>N</sup> of a negotiation <sup>N</sup> . Let <sup>π</sup> = (n0,(p0, r0), n1)··· (nk,(pk, rk), nk+1) be a path of <sup>G</sup><sup>N</sup> . We define the *maximal execution time* of a path π as the value δ+(π) = <sup>i</sup>∈0..k <sup>γ</sup>+(ni, ri). We say that a path <sup>π</sup> <sup>=</sup> (n0,(p0, r0), n1)···(n-,(p-, r-), n-+1) is a path of some run <sup>ρ</sup> = (M1, μ1) (n1,r- <sup>1</sup>) −→ ···(Mk, μk) if <sup>r</sup>0,...,r is a subword of r 1,...,r k.

**Lemma 1.** *Let* <sup>N</sup> *be an acyclic and sound timed negotiation. Then* maxtime(<sup>N</sup> ) = max<sup>π</sup>∈P aths(G<sup>N</sup> ) <sup>δ</sup><sup>+</sup>(π) + <sup>γ</sup><sup>+</sup>(n<sup>f</sup> , r<sup>f</sup> )*.*

*Proof.* Let us first prove that maxtime(<sup>N</sup> ) <sup>≥</sup> max<sup>π</sup>∈P aths(G<sup>N</sup> ) <sup>δ</sup><sup>+</sup>(π)+γ<sup>+</sup>(n<sup>f</sup> , r<sup>f</sup> ). Consider any path <sup>π</sup> of <sup>G</sup><sup>N</sup> , ending in some node <sup>n</sup>. First, as <sup>N</sup> is sound, we can compute a run ρ<sup>π</sup> such that π is a path of ρπ, and ρ<sup>π</sup> ends in a configuration in which n is enabled. We associate with ρ<sup>π</sup> the timed run ρ<sup>+</sup> <sup>π</sup> which associates to every node the latest possible execution date. We have easily δ(ρ<sup>+</sup> <sup>π</sup> ) ≥ <sup>δ</sup><sup>+</sup>(π), and then we obtain max<sup>π</sup>∈P aths(G<sup>N</sup> ) <sup>δ</sup>(ρ<sup>+</sup> <sup>π</sup> ) <sup>≥</sup> max<sup>π</sup>∈P aths(G<sup>N</sup> ) <sup>δ</sup><sup>+</sup>(π). As maxtime(<sup>N</sup> ) is the maximal duration over all runs, it is hence necessarily greater than max<sup>π</sup>∈P aths(G<sup>N</sup> ) <sup>δ</sup>(ρ<sup>+</sup> <sup>π</sup> ) + γ<sup>+</sup>(n<sup>f</sup> , r<sup>f</sup> ).

We now prove that maxtime(<sup>N</sup> ) <sup>≤</sup> max<sup>π</sup>∈P aths(G<sup>N</sup> ) <sup>δ</sup><sup>+</sup>(π)+γ<sup>+</sup>(n<sup>f</sup> , r<sup>f</sup> ). Take any timed run <sup>ρ</sup> = (M1, μ1) (n1,r1) −→ · · ·(Mk, μk) of <sup>N</sup> with a unique maximal node <sup>n</sup>k. We show that there exists a path <sup>π</sup> of <sup>ρ</sup> such that <sup>δ</sup>(ρ) <sup>≤</sup> <sup>δ</sup><sup>+</sup>(π) by induction on the length <sup>k</sup> of <sup>ρ</sup>. The initialization is trivial for <sup>k</sup> = 1. Let <sup>k</sup> <sup>∈</sup> <sup>N</sup>. Because <sup>n</sup><sup>k</sup> is the unique maximal node of <sup>ρ</sup>, we have <sup>δ</sup><sup>+</sup>(ρ) = max<sup>p</sup>∈Pnk <sup>μ</sup><sup>k</sup>−<sup>1</sup>(p)+γ<sup>+</sup>(nk, rk). We choose one <sup>p</sup><sup>k</sup>−<sup>1</sup> maximizing <sup>μ</sup><sup>k</sup>−<sup>1</sup>(p). Let <k be the maximal index of a decision involving process <sup>p</sup><sup>k</sup>−<sup>1</sup> (i.e. <sup>p</sup><sup>k</sup>−<sup>1</sup> <sup>∈</sup> <sup>P</sup><sup>n</sup>- ). Now, consider the timed run ρ subword of ρ, but with n as unique maximal node (that is, it is ρ where nodes ni,i> has been removed, but also where some nodes ni,i< have been removed if they are not causally before n- (in particular, <sup>P</sup>n<sup>i</sup> <sup>∩</sup> <sup>P</sup>n-= ∅).)

By definition, we have that δ<sup>+</sup>(ρ) = δ<sup>+</sup>(ρ ) + γ<sup>+</sup>(n-, r-) + γ<sup>+</sup>(nk, rk). We apply the induction hypothesis on ρ , and obtain a path π of ρ ending in n such that δ<sup>+</sup>(ρ ) + γ<sup>+</sup>(n-, r-) <sup>≤</sup> <sup>δ</sup><sup>+</sup>(π ). It suffices to consider path π = π .(n-,(p<sup>k</sup>−<sup>1</sup>, r-), nk) to prove the inductive step <sup>δ</sup><sup>+</sup>(ρ) <sup>≤</sup> <sup>δ</sup><sup>+</sup>(π) + <sup>γ</sup><sup>+</sup>(nk, rk).

Thus maxtime(<sup>N</sup> ) = max <sup>δ</sup><sup>+</sup>(ρ) <sup>≤</sup> max<sup>π</sup>∈P aths(G<sup>N</sup> ) <sup>δ</sup><sup>+</sup>(π) + <sup>γ</sup><sup>+</sup>(n<sup>f</sup> , r<sup>f</sup> ).

Lemma 1 gives a way to evaluate the maximal execution time. This amounts to finding a path of maximal weight in an acyclic graph, which is a standard PTIME problem that can be solved using standard max-cost calculation.

**Proposition 3.** *Computing the maximal execution time for an acyclic sound negotiation* <sup>N</sup> = (N,n0, n<sup>f</sup> , <sup>X</sup> ) *can be done in time* <sup>O</sup>(|N<sup>|</sup> <sup>+</sup> |X |)*.*

A direct consequence is that maxtime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup> and maxtime(<sup>N</sup> ) = <sup>T</sup> problems can be solved in polynomial time when N is sound. Notice that if N is deterministic but not sound, then Lemma 1 does not hold: we only have an inequality.

We now turn to mintime(<sup>N</sup> ). We show that it is strictly harder to compute for sound negotiations than maxtime(<sup>N</sup> ).

## **Theorem 5.** mintime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup> *is NP-complete in the strong sense for sound acyclic negotiations, even if* N *is very weakly non-deterministic.*

*Proof (sketch).* First, we can decide mintime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup> in NP. Indeed, one can guess a final (untimed) run <sup>ρ</sup> of size ≤ |N|, consider <sup>ρ</sup><sup>−</sup> the timed run corresponding to ρ where all outcomes are taken at the earliest possible dates, and compute in linear time <sup>δ</sup>(ρ−), and check that <sup>δ</sup>(ρ−) <sup>≤</sup> <sup>T</sup>.

The hardness part is obtained by reduction from the **Bin Packing** problem. The reduction is similar to Knapsack, that we will present in Thm. 7. The difference is that we use bins in parallel, rather than 2 processes, one for the weight and one for the value. The hardness is thus strong, but the negotiation is not k-layered for a bounded k (it is 2 + 1 bounded, with depending on the input). A detailed proof is given in Appendix B of [1].

We show that mintime(<sup>N</sup> ) = <sup>T</sup> is harder to decide than mintime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup>, with a proof similar to Prop. 2.

**Proposition 4.** *The* mintime(<sup>N</sup> ) = <sup>T</sup>? *decision problem is DP-complete for sound acyclic negotiations, even if it is very weakly non-deterministic.*

An open question is whether the minimal execution time can be computed in PTIME if the negotiation is both sound and deterministic. The reduction from Bin Packing does not work with deterministic (and sound) negotiations.

## **7** *k***-Layered Negotiations**

In this section, we consider k-layeredness, a syntactic property that can be efficiently verified (see Section 2).

### **7.1 Algorithmic properties**

Let k be a fixed integer. We first show that the maximum execution time can be computed in PTIME for k-layered negotiations. Let N<sup>i</sup> be the set of nodes at layer <sup>i</sup>. We define for every layer <sup>i</sup> the set <sup>S</sup><sup>i</sup> of subsets of nodes <sup>X</sup> <sup>⊆</sup> <sup>N</sup><sup>i</sup> which can be jointly enabled and such that for every process p, there is exactly one node <sup>n</sup>(X, p) in <sup>X</sup> with <sup>p</sup> <sup>∈</sup> <sup>n</sup>(X, p). An element <sup>X</sup> in <sup>S</sup><sup>i</sup> is a subset of nodes that can be selected by solving all non-determnism with an appropriate choice of outcomes. Formally, we define <sup>S</sup><sup>i</sup> inductively. We start with <sup>S</sup><sup>0</sup> <sup>=</sup> {n0}. We then define <sup>S</sup>i+1 from the contents of layer <sup>S</sup>i: we have <sup>Y</sup> <sup>∈</sup> <sup>S</sup>i+1 iff - <sup>n</sup>∈<sup>Y</sup> <sup>P</sup><sup>n</sup> <sup>=</sup> <sup>P</sup> and there exist <sup>X</sup> <sup>∈</sup> <sup>S</sup><sup>i</sup> and an outcome <sup>r</sup><sup>m</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup> for every <sup>m</sup> <sup>∈</sup> <sup>X</sup>, such that <sup>n</sup> ∈ X (n(X, p), p, rm) for each <sup>n</sup> <sup>∈</sup> <sup>Y</sup> and <sup>p</sup> <sup>∈</sup> <sup>P</sup>n.

**Theorem 6.** *Let* <sup>k</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup>*. Computing the maximum execution time for a* <sup>k</sup>*layered acyclic negotiation* N *can be done in PTIME. More precisely, the worstcase time complexity is* <sup>O</sup>(|P| · |N |<sup>k</sup>+1)*.*

*Proof (Sketch).* The first step is to compute S<sup>i</sup> layer by layer, by following its inductive definition. The set <sup>S</sup><sup>i</sup> is of size at most 2k, as <sup>|</sup>Ni<sup>|</sup> < k by definition of k-layeredness. Knowing Si, it is easy to build Si+1 by induction. This takes time in <sup>O</sup>(|P||N |k+1) : We need to consider all <sup>k</sup>-uples of outcomes for each layer. There can be |N |<sup>k</sup> such tuples. We need to do that for all processes (|P|), and for all layers (at most |N |).

We then keep for each subset <sup>X</sup> <sup>∈</sup> <sup>S</sup><sup>i</sup> and each node <sup>n</sup> <sup>∈</sup> <sup>X</sup>, the maximal time <sup>f</sup>i(n, X) <sup>∈</sup> <sup>N</sup> associated with <sup>n</sup> and <sup>X</sup>. From <sup>S</sup>i+1 and <sup>f</sup>i, we inductively compute <sup>f</sup>i+1 in the following way: for all <sup>X</sup> <sup>∈</sup> <sup>S</sup><sup>i</sup> with successor <sup>Y</sup> <sup>∈</sup> <sup>S</sup>i+1 for outcomes (rp)<sup>p</sup>∈<sup>P</sup> , we denote <sup>f</sup>i+1(Y, n, X) = max<sup>p</sup>∈<sup>P</sup> (n) <sup>f</sup>i(X, n(X, p)) + <sup>γ</sup><sup>+</sup>(n(X, p), rp). If there are several choices of (rp)<sup>p</sup>∈<sup>P</sup> leading to the same <sup>Y</sup> , we take r<sup>p</sup> with the maximal fi(X, n(X, p)) + γ<sup>+</sup>(n(X, p), rp). We then define <sup>f</sup>i+1(Y,n) = max<sup>X</sup>∈S<sup>i</sup> <sup>f</sup>i+1(Y, n, X). Again, the initialization is trivial, with <sup>f</sup>0({n0}, n0) = 0. The maximal execution time of <sup>N</sup> is <sup>f</sup>({n<sup>f</sup> }, n<sup>f</sup> ).

We can bound the complexity precisely by <sup>O</sup>(d(<sup>N</sup> ) · <sup>C</sup>(<sup>N</sup> ) · ||R||<sup>k</sup><sup>∗</sup> ), with:


Consider again the Brexit example Figure 1. We have (k + 1) = 7, while we have the depth <sup>d</sup>(<sup>N</sup> ) = 6, the negotiation is <sup>k</sup><sup>∗</sup> = 3-thread bounded (k<sup>∗</sup> is bounded by the number of processes), ||R|| = 2, and the number of contexts is at most <sup>C</sup>(<sup>N</sup> ) = 4 (EU chooses to enforce backstop or not, and Pa chooses to go to court or not).

#### **7.2 Minimal Execution Time**

As with sound negotiations, computing minimal time is much harder than computing the maximal time for k-layered negotiations:

**Theorem 7.** *Let* <sup>k</sup> <sup>≥</sup> <sup>6</sup>*. The* M in <sup>≤</sup> <sup>T</sup> *problem is NP-Complete for* <sup>k</sup>*-layered acyclic negotiations, even if the negotiation is sound and very weakly non-deterministic.*

*Proof.* One can guess in polynomial time a final run of size ≤ |N |. If the execution time of this final run is smaller than T then we have found a final run witnessing mintime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup>. Hence the problem is in NP.

Let us now show that the problem is NP-hard. We proceed by reduction from the **Knapsack** decision problem. Let us consider a set of items <sup>U</sup> <sup>=</sup> {u1,...un} of respective values v1,...v<sup>n</sup> and weight w1,...,w<sup>n</sup> and a knapsack of maximal capacity W. The knapsack problem asks, given a value V whether there exists a subset of items <sup>U</sup> <sup>⊆</sup> <sup>U</sup> such that ui∈U <sup>v</sup><sup>i</sup> <sup>≥</sup> <sup>V</sup> and such that ui∈U<sup>w</sup><sup>i</sup> <sup>≤</sup> <sup>W</sup>.

**Fig. 5.** The negotiation encoding Knapsack

We build a negotiation with 2<sup>n</sup> processes <sup>P</sup> <sup>=</sup> {p1,...p2n}, as shown in Fig. 5. Intuitively, <sup>p</sup>i, i <sup>≤</sup> <sup>n</sup> will serve to encode the value of selected items as timing, while pi,i>n will serve to encode the weight of selected items as timing.

Concerning timing constraints for outcomes we do the following: Outcomes 0, yes and no are associated with [0, 0]. Outcome c<sup>i</sup> is associated with [wi, wi], the weight of ui. Last, outcome b<sup>i</sup> is associated with a more complex function, such that <sup>i</sup> <sup>b</sup><sup>i</sup> <sup>≤</sup> <sup>W</sup> iff <sup>i</sup> <sup>v</sup><sup>i</sup> <sup>≥</sup> <sup>V</sup> . For that, we set [ (vmax−vi)<sup>W</sup> <sup>n</sup>·vmax−<sup>V</sup> , <sup>v</sup>max<sup>W</sup> n·vmax−v<sup>i</sup> ] for outcome bi, where vmax is the largest value of an item, and V is the total value we want to reach at least. Also, we set [ (vmax)<sup>W</sup> <sup>n</sup>·vmax−<sup>V</sup> , <sup>v</sup>max<sup>W</sup> n·vmax−v<sup>i</sup> ] for outcome ai. We set T = W, the maximal weight of the knapsack.

Now, consider a final run <sup>ρ</sup> in <sup>N</sup> . The only choices in <sup>ρ</sup> are outcomes yes or no from C1,...,Cn. Let I be the set of indices such that yes is the outcome from all C<sup>i</sup> in this path. We obtain δ(ρ) = max( i /∈<sup>I</sup> <sup>a</sup><sup>i</sup> <sup>+</sup> <sup>i</sup>∈<sup>I</sup> <sup>b</sup>i, <sup>i</sup>∈<sup>I</sup> <sup>c</sup>i). We have <sup>δ</sup>(ρ) <sup>≤</sup> <sup>T</sup> <sup>=</sup> <sup>W</sup> iff <sup>i</sup>∈<sup>I</sup> <sup>w</sup><sup>i</sup> <sup>≤</sup> <sup>W</sup>, that is the sum of the weights is lower than W, and i /∈I (vmax)W <sup>n</sup>·vmax−<sup>V</sup> <sup>+</sup> i∈I (vmax−vi)W <sup>n</sup>·vmax−<sup>V</sup> <sup>≤</sup> <sup>W</sup>. That is, <sup>n</sup> · <sup>v</sup>max <sup>−</sup> <sup>i</sup>∈<sup>I</sup> <sup>v</sup><sup>i</sup> <sup>≤</sup> <sup>n</sup> · <sup>v</sup>max <sup>−</sup> <sup>V</sup> , i.e. <sup>i</sup>∈<sup>I</sup> <sup>v</sup><sup>i</sup> <sup>≥</sup> <sup>V</sup> . Hence, there exists a path <sup>ρ</sup> with <sup>δ</sup>(ρ) <sup>≤</sup> <sup>T</sup> <sup>=</sup> <sup>W</sup> iff there exists a set of items of weight less than <sup>W</sup> and of value more than <sup>V</sup> .

It is well known that Knapsack is weakly NP-hard, that is, it is NP-hard only when weights/values are given in binary. This means that Thm. 7 shows that minimum execution time <sup>≤</sup> <sup>T</sup> is NP-hard only when <sup>T</sup> is given in binary. We can actually show that for <sup>k</sup>-layered negotiations, the mintime(<sup>N</sup> ) <sup>≤</sup> <sup>T</sup> problem can be decided in PTIME if T is given in unary (i.e. if T is not too large):

**Theorem 8.** *Let* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*. Given a* <sup>k</sup>*-layered negotiation* <sup>N</sup> *and* <sup>T</sup> *written in unary, one can decide in PTIME whether the minimum execution time of* N *is* <sup>≤</sup> <sup>T</sup>*. The worst-case time complexity is* <sup>O</sup>(|N | · |P| · (<sup>T</sup> · |N |)k)*.*

*Proof.* We will remember for each layer <sup>i</sup> a set <sup>T</sup><sup>i</sup> of functions <sup>τ</sup> from nodes <sup>N</sup><sup>i</sup> of layer <sup>i</sup> to a value in {1,...,T, ⊥}. Basically, we have <sup>τ</sup> ∈ T<sup>i</sup> if there exists a path <sup>ρ</sup> reaching <sup>X</sup> <sup>=</sup> {<sup>n</sup> <sup>∈</sup> <sup>N</sup><sup>i</sup> <sup>|</sup> <sup>τ</sup> (n) <sup>=</sup> ⊥}, and this path reaches node <sup>n</sup> <sup>∈</sup> <sup>X</sup> after τ (n) time units. As for Si, for all p, we should have a unique node n(τ, p) such that <sup>p</sup> <sup>∈</sup> <sup>n</sup>(τ, p) and <sup>τ</sup> (n(τ, p)) <sup>=</sup> <sup>⊥</sup>. Again, it is easy to initialize <sup>T</sup><sup>0</sup> <sup>=</sup> {τ0}, with <sup>τ</sup>0(n0) = 0, and <sup>τ</sup>0(n) = <sup>⊥</sup> for all <sup>n</sup> <sup>=</sup> <sup>n</sup>0.

Inductively, we build <sup>T</sup>i+1 in the following way: <sup>τ</sup>i+1 ∈ Ti+1 iff there exists a <sup>τ</sup><sup>i</sup> ∈ T<sup>i</sup> and <sup>r</sup><sup>p</sup> <sup>∈</sup> <sup>R</sup>n(τi,p) for all <sup>p</sup> <sup>∈</sup> <sup>P</sup> such that for all <sup>n</sup> with <sup>τ</sup>i+1(n) <sup>=</sup> <sup>⊥</sup>, we have τi+1(n) = max<sup>p</sup> τ <sup>−</sup> <sup>i</sup> (n(τi, p)) + <sup>γ</sup>(n(τi, p), rp).

We have that the minimum execution time for <sup>N</sup> is min<sup>τ</sup>∈T<sup>n</sup> <sup>τ</sup> (n<sup>τ</sup> ), for <sup>n</sup> the depth of <sup>n</sup><sup>f</sup> . There are at most <sup>T</sup> <sup>k</sup> functions <sup>τ</sup> in any <sup>T</sup>i, and there are at most |N | layers to consider, giving the complexity.

As with Thm. 6, we can more accurately state the complexity as <sup>O</sup>(d(<sup>N</sup> ) · <sup>C</sup>(<sup>N</sup> )·||R||<sup>k</sup><sup>∗</sup> ·<sup>T</sup> <sup>k</sup>∗−<sup>1</sup>). The <sup>k</sup><sup>∗</sup> <sup>−</sup>1 is because we only need to remember minimal functions <sup>τ</sup> ∈ Ti: if <sup>τ</sup> (n) <sup>≥</sup> <sup>τ</sup> (n) for all <sup>n</sup>, then we do not need to keep <sup>τ</sup> in <sup>T</sup>i. In particular, for the knapsack encoding in the proof of Thm. 7, we have k<sup>∗</sup> = 3, ||R|| = 2 and <sup>C</sup>(<sup>N</sup> ) = 4. Notice that if <sup>k</sup> is part of the input, then the problem is strongly NP-hard, even if T is given in unary, as e.g. encoding bin packing with bins result to a 2 + 1-layered negotiations.

## **8 Conclusion**

In this paper, we considered timed negotiations. We believe that time is of the essence in negotiations, as examplified by the Brexit negotiation. It is thus important to be able to compute in a tractable way the minimal and maximal execution time of negotiations. We showed that we can compute in PTIME the maximal execution time for acyclic negotiations that are either sound or k-layered, for k fixed. We showed that we cannot compute in PTIME the maximal execution time for negotiations that are not sound nor k-layered, even if they are deterministic and acyclic (unless NP=PTIME). We also showed that surprisingly, computing the minimal execution time is much harder, with strong NP-hardness results in most of the classes of negotiations, contradicting a claim in [10]. We came up with a new reasonable class of negotiations, namely k-layered negotiations, which enjoys a pseudo PTIME algorithm to compute the minimal execution time. That is, the algorithm is PTIME when the timing constants are given in unary. We showed that this restriction is necessary, as the problem becomes NP-hard for constants given in binary, even when the negotiation is sound and very weakly non-deterministic. The problem to know whether the minimal execution time can be computed in PTIME for deterministic and sound negotiation remains open.

## **References**


16. R.H. Sloan and U.A. Buy. Reduction Rules for Time Petri Nets. *Acta Inf.*, 33(7):687–706, 1996.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Cartesian Difference Categories**

Mario Alvarez-Picallo<sup>1</sup> and Jean-Simon Pacaud Lemay (-)2-

<sup>1</sup> Department of Computer Science, University of Oxford, Oxford, UK

mario.alvarez-picallo@cs.ox.ac.uk <sup>2</sup> Department of Computer Science, University of Oxford, Oxford, UK jean-simon.lemay@kellogg.ox.ac.uk

**Abstract.** Cartesian differential categories are categories equipped with a differential combinator which axiomatizes the directional derivative. Important models of Cartesian differential categories include classical differential calculus of smooth functions and categorical models of the differential λ-calculus. However, Cartesian differential categories cannot account for other interesting notions of differentiation such as the calculus of finite differences or the Boolean differential calculus. On the other hand, change action models have been shown to capture these examples as well as more "exotic" examples of differentiation. However, change action models are very general and do not share the nice properties of a Cartesian differential category. In this paper, we introduce Cartesian difference categories as a bridge between Cartesian differential categories and change action models. We show that every Cartesian differential category is a Cartesian difference category, and how certain well-behaved change action models are Cartesian difference categories. In particular, Cartesian difference categories model both the differential calculus of smooth functions and the calculus of finite differences. Furthermore, every Cartesian difference category comes equipped with a tangent bundle monad whose Kleisli category is again a Cartesian difference category.

**Keywords:** Cartesian Difference Categories · Cartesian Differential Categories · Change Actions · Calculus Of Finite Differences · Stream Calculus.

## **1 Introduction**

In the early 2000s, Ehrhard and Regnier introduced the differential λ-calculus [10], an extension of the λ-calculus equipped with a differential combinator capable of taking the derivative of arbitrary higher-order functions. This development, based on models of linear logic equipped with a natural notion of "derivative" [11], sparked a wave of research into categorical models of differentiation.

One of the most notable developments in the area is the introduction of Cartesian differential categories [4] by Blute, Cockett and Seely, which provide an abstract categorical axiomatization of the directional derivative from differential

<sup>-</sup> The second author is financially supported by Kellogg College, the Oxford-Google Deep Mind Graduate Scholarship, and the Clarendon Fund.

calculus. The relevance of Cartesian differential categories lies in their ability to model both "classical" differential calculus (with the canonical example being the category of Euclidean spaces and smooth functions between) and the differential λ-calculus (as every categorical model for it gives rise to a Cartesian differential category [14]). However, while Cartesian differential categories have proven to be an immensely successful formalism, they have, by design, some limitations. Firstly, they cannot account for certain "exotic" notions of derivative, such as the difference operator from the calculus of finite differences [16] or the Boolean differential calculus [19]. This is because the axioms of a Cartesian differential category stipulate that derivatives should be linear in their second argument (in the same way that the directional derivative is), whereas these aforementioned discrete sorts of derivative need not be. Additionally, every Cartesian differential category is equipped with a tangent bundle monad [7, 15] whose Kleisli category can be intuitively understood as a category of generalized vector fields. This Kleisli category has an obvious differentiation operator which comes close to making it a Cartesian differential category, but again fails the requirement of being linear in its second argument.

More recently, discrete derivatives have been suggested as a semantic framework for understanding incremental computation. This led to the development of change structures [6] and change actions [2]. Change action models have been successfully used to provide a model for incrementalizing Datalog programs [1], but have also been shown to model the calculus of finite differences as well as the Kleisli category of the tangent bundle monad of a Cartesian differential category. Change action models, however, are very general, lacking many of the nice properties of Cartesian differential categories (for example, addition in a change action model is not required to be commutative), even though they are verified in most change action models. As a consequence of this generality, the tangent bundle endofunctor in a change action model can fail to be a monad.

In this work, we introduce Cartesian difference categories (Section 4.2), whose key ingredients are an infinitesimal extension operator and a difference combinator, whose axioms are a generalization of the differential combinator axioms of a Cartesian differential category. In Section 4.3, we show that every Cartesian differential category is, in fact, a Cartesian difference category whose infinitesimal extension operator is zero, and conversely how every Cartesian difference category admits a full subcategory which is a Cartesian differential category. In Section 4.4, we show that every Cartesian difference category is a change action model, and conversely how a full subcategory of suitably well-behaved objects of a change action model is a Cartesian difference category. In Section 6, we show that every Cartesian difference category comes equipped with a monad whose Kleisli category again a Cartesian difference category. Finally, in Section 5 we provide some examples of Cartesian difference categories; notably, the calculus of finite differences and the stream calculus.

## **2 Cartesian Differential Categories**

In this section, we briefly review Cartesian differential categories, so that the reader may compare Cartesian differential categories with the new notion of Cartesian *difference* categories which we introduce in the next section. For a full detailed introduction on Cartesian differential categories, we refer the reader to the original paper [4].

#### **2.1 Cartesian Left Additive Categories**

Here we recall the definition of Cartesian left additive categories [4] – where "additive" is meant being skew enriched over commutative monoids, which in particular means that we do not assume the existence of additive inverses, i.e., "negative elements". By a Cartesian category we mean a category X with chosen finite products where we denote the binary product of objects A and B by A × B with projection maps π<sup>0</sup> : A × B → A and π<sup>1</sup> : A × B → B and pairing operation −, −, and the chosen terminal object as with unique terminal maps !<sup>A</sup> : A → .

**Definition 1.** *A left additive category [4] is a category* X *such that each hom-set* <sup>X</sup>(A, B) *is a commutative monoid with addition operation* + : <sup>X</sup>(A, B)<sup>×</sup> <sup>X</sup>(A, B) <sup>→</sup> <sup>X</sup>(A, B) *and zero element (called the zero map)* <sup>0</sup> <sup>∈</sup> <sup>X</sup>(A, B)*, such that pre-composition preserves the additive structure:* (f + g) ◦ h = f ◦ h + g ◦ h *and* 0◦f = 0*. A map* k *in a left additive category is additive if post-composition by* k *preserves the additive structure:* k ◦ (f + g) = k ◦ f + k ◦ g *and* k ◦ 0=0*. A Cartesian left additive category [4] is a Cartesian category* X *which is also a left additive category such all projection maps* π<sup>0</sup> : A × B → A *and* π<sup>1</sup> : A × B → B *are additive.*

We note that the definition given here of a Cartesian left additive category is slightly different from the one found in [4], but it is indeed equivalent. By [4, Proposition 1.2.2], an equivalent axiomatization is of a Cartesian left additive category is that of a Cartesian category where every object comes equipped with a commutative monoid structure such that the projection maps are monoid morphisms. This will be important later in Section 4.2.

#### **2.2 Cartesian Differential Categories**

**Definition 2.** *A Cartesian differential category [4] is a Cartesian left additive category equipped with a differential combinator* D *of the form*

$$\frac{f:A \to B}{\mathbb{D}[f]:A \times A \to B}$$

*verifying the following coherence conditions:*

**[CD.1]** D[f + g] = D[f] + D[g] *and* D[0] = 0

**[CD.2]** D[f] ◦ x, y + z = D[f] ◦ x, y + D[f] ◦ x, z *and* D[f] ◦ x, 0 = 0 **[CD.3]** D[1A] = π<sup>1</sup> *and* D[π0] = π<sup>0</sup> ◦ π<sup>1</sup> *and* D[π1] = π<sup>1</sup> ◦ π<sup>1</sup> **[CD.4]** D[f,g] = D[f], D[g] *and* D[!A] =!A×<sup>A</sup> **[CD.5]** D[g ◦ f] = D[g] ◦ f ◦ π0, D[f] **[CD.6]** D [D[f]] ◦ x, y,0, z = D[f] ◦ x, z **[CD.7]** D [D[f]] ◦ x, y,z, 0 = D [D[f]] ◦ x, z,y, 0

Note that here, following the more recent work on Cartesian differential categories, we've flipped the convention found in [4], so that the linear argument is in the second argument rather than in the first argument.

We highlight that by [7, Proposition 4.2], the last two axioms **[CD.6]** and **[CD.7]** have an equivalent alternative expression.

**Lemma 1.** *In the presence of the other axioms,* **[CD.6]** *and* **[CD.7]** *are equivalent to:*

**[CD.6.a]** D [D[f]] ◦ x, 0,0, y = D[f] ◦ x, y **[CD.7.a]** D [D[f]] ◦ x, y,z, w = D [D[f]] ◦ x, z,y, w

As a Cartesian difference category is a generalization of a Cartesian differential category, we leave the discussion of the intuition of these axioms for later in Section 4.2 below. We also refer to [4, Section 4] for a term calculus which may help better understand the axioms of a Cartesian differential category. The canonical example of a Cartesian differential category is the category of real smooth functions, which we will discuss in Section 5.1. Other interesting examples of can be found throughout the literature such as categorical models of the differential λ-calculus [10, 14], the subcategory of differential objects of a tangent category [7], and the coKleisli category of a differential category [3, 4].

## **3 Change Action Models**

Change actions [1, 2] have recently been proposed as a setting for reasoning about higher-order incremental computation, based on a discrete notion of differentiation. Together with Cartesian differential categories, they provide the core ideas behind Cartesian difference categories. In this section, we quickly review change actions and change action models, in particular, to highlight where some of the axioms of a Cartesian difference category come from. For more details on change actions, we invite readers to see the original paper [2].

#### **3.1 Change Actions**

**Definition 3.** *A change action* A *in a Cartesian category* X *is a quintuple* A ≡ (A, ΔA, ⊕A, +A, 0A) *consisting of two objects* A *and* ΔA*, and three maps:*

$$A \oplus\_A : A \times \Delta A \to A \qquad \quad +\_A : \Delta A \times \Delta A \to \Delta A \qquad \quad \quad 0\_A : \top \to \Delta A$$

*such that* (ΔA, +A, 0A) *is a monoid and* ⊕<sup>A</sup> : A × ΔA → A *is an action of* ΔA *on* A*, that is, the following equalities hold:*

$$1 \oplus\_A \circ \langle 1\_A, 0\_A \circ 1\_A \rangle = 1\_A \qquad \quad \oplus\_A \circ (1\_A \times +\_A) = \oplus\_A \circ (\oplus\_A \times 1\_{\Delta A})$$

For a change action A and given a pair of maps f : C → A and g : C → ΔA, we define f ⊕<sup>A</sup> g : C → A as f ⊕<sup>A</sup> g = ⊕<sup>A</sup> ◦f,g. Similarly, for maps h : C → ΔA and k : C → ΔA, define h +<sup>A</sup> k = +<sup>A</sup> ◦ h, k. Therefore, that ⊕<sup>A</sup> is an action of ΔA on A can be rewritten as:

$$1\_A \oplus\_{\overline{A}} 0\_A = 1\_A \qquad \qquad 1\_A \oplus\_{\overline{A}} (1\_{\Delta A} +\_{\overline{A}} 1\_{\Delta A}) = (1\_A \oplus\_{\overline{A}} 1\_{\Delta A}) \oplus\_{\overline{A}} 1\_{\Delta A}$$

The intuition behind the above definition is that the monoid ΔA is a type of possible "changes" or "updates" that might be applied to A, with the monoid structure on ΔA representing the capability to compose updates.

Change actions give rise to a notion of derivative, with a distinctly "discrete" flavour. Given change actions on objects A and B, a map f : A → B can be said to be differentiable when changes to the input (in the sense of elements of ΔA) are mapped to changes to the output (that is, elements of ΔB). In the setting of incremental computation, this is precisely what it means for f to be incrementalizable, with the derivative of f corresponding to an incremental version of f.

**Definition 4.** *Let* A ≡ (A, ΔA, ⊕A, +A, 0A) *and* B ≡ (B, ΔB, ⊕B, +B, 0B) *be change actions. For a map* f : A → B*, a map ∂*[f] : A × ΔA → ΔB *is a derivative of* f *whenever the following equalities hold:*

$$\begin{array}{l} \mathbf{[CAD.1]} \ f \circ (x \oplus\_{\overline{A}} y) = f \circ x \oplus\_{\overline{B}} (\mathcal{O}[f] \circ \langle x, y \rangle) \\\ \mathbf{[CAD.2]} \ \mathcal{O}[f] \circ \langle x, y +\_{\overline{A}} z \rangle = (\mathcal{O}[f] \circ \langle x, y \rangle) +\_{B} (\mathcal{O}[f] \circ \langle x \oplus\_{\overline{A}} y, z \rangle) \text{ and} \\\ \mathbf{\partial[f]} \circ \langle x, 0\_{B} \circ !\_{B} \rangle = 0\_{B} \circ !\_{A \times \Delta A} \end{array}$$

The intuition for these axioms will be explained in more detail in Section 4.2 when we explain the axioms of a Cartesian difference category. Note that although there is nothing in the above definition guaranteeing that any given map has at most a single derivative, the chain rule does hold. As a corollary, differentiation is compositional and therefore the change actions in X form a category.

**Lemma 2.** *Whenever ∂*[f] *and ∂*[g] *are derivatives for composable maps* f *and* g *respectively, then ∂*[g] ◦ f ◦ π0, *∂*[f] *is a derivative for* g ◦ f*.*

#### **3.2 Change Action Models**

**Definition 5.** *Given a Cartesian category* X*, define its change actions category* CAct(X) *as the category whose objects are change actions in* X *and whose arrows* <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> *are the pairs* (f, *<sup>∂</sup>*[f])*, where* <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> *is an arrow in* <sup>X</sup> *and ∂*[f] : A × ΔA → ΔB *is a derivative for* f*. The identity is* (1A, π1)*, while composition of* (f, *∂*[f]) *and* (g, *∂*[g]) *is* (g ◦ f, *∂*[g] ◦ f ◦ π0, *∂*[f])*.*

There is an obvious product-preserving forgetful functor <sup>E</sup> : CAct(X) <sup>→</sup> <sup>X</sup> sending every change action (A, ΔA, ⊕, +, 0) to its base object A and every map (f, *∂*[f]) to the underlying map f. As a setting for studying differentiation, the category CAct(X) is rather lacklustre, since there is no notion of higher derivatives, so we will instead work with change action models. Informally, a change action model consists of a rule which for every object A of X associates a change action over it, and for every map a choice of a derivative.

**Definition 6.** *A change action model is a Cartesian category* X *is a productpreserving functor* <sup>α</sup> : <sup>X</sup> <sup>→</sup> CAct(X) *that is a section of the forgetful functor* <sup>E</sup>*.*

For brevity, when A is an object of a change action model, we will write ΔA, ⊕A, +A, and 0<sup>A</sup> to refer to the components of the corresponding change action α(A). Examples of change action models can be found in [2]. In particular, we highlight that a Cartesian differential category always provides a change model action. We will generalize this result, and show in Section 4.4 that a Cartesian difference category also always provides a change action model.

## **4 Cartesian Difference Categories**

In this section, we introduce *Cartesian difference categories*, which are generalizations of Cartesian differential categories. Examples of Cartesian difference categories can be found in Section 5.

#### **4.1 Infinitesimal Extensions in Left Additive Categories**

We first introduce infinitesimal extensions, which is an operator that turns a map into an "infinitesimal" version of itself – in the sense that every map coincides with its Taylor approximation on infinitesimal elements.

**Definition 7.** *A Cartesian left additive category* X *is said to have an infinitesimal extension* ε *if every homset* X(A, B) *comes equipped with a monoid morphism* <sup>ε</sup> : <sup>X</sup>(A, B) <sup>→</sup> <sup>X</sup>(A, B)*, that is,* <sup>ε</sup>(<sup>f</sup> <sup>+</sup> <sup>g</sup>) = <sup>ε</sup>(f) + <sup>ε</sup>(g) *and* <sup>ε</sup>(0) = 0*, and such that* ε(g ◦ f) = ε(g) ◦ f *and* ε(π0) = π<sup>0</sup> ◦ ε(1<sup>A</sup>×<sup>B</sup>) *and* ε(π1) = π<sup>1</sup> ◦ ε(1<sup>A</sup>×<sup>B</sup>)*.*

Note that since ε(g ◦ f) = ε(g) ◦ f, it follows that ε(f) = ε(1B) ◦ f and ε(1A) : A → A is an additive map (Definition 1). In light of this, it turns out that infinitesimal extensions can equivalently be described as a class of additive maps ε<sup>A</sup> : A → A such that ε<sup>A</sup>×<sup>B</sup> = ε<sup>A</sup> ×εB. The equivalence is given by setting ε(f) = ε<sup>B</sup> ◦ f and ε<sup>A</sup> = ε(1A). Furthermore, infinitesimal extensions equipped each object with a canonical change action structure:

**Lemma 3.** *Let* X *be a Cartesian left additive category with infinitesimal extension* ε*. For every object* A*, define the maps* ⊕<sup>A</sup> : A×A → A *as* ⊕<sup>A</sup> = π<sup>0</sup> +ε(π1)*,* +<sup>A</sup> : A×A → A *as* π0+π1*, and* 0<sup>A</sup> : → A *as* 0<sup>A</sup> = 0*. Then* (A, A, ⊕A, +A, 0A) *is a change action in* X*.*

*Proof.* As mentioned earlier, that (A, +A, 0A) is a commutative monoid was shown in [4]. On the other hand, that ⊕<sup>A</sup> is a change action follows from the fact that ε preserves the addition. -

Setting A ≡ (A, A, ⊕A, +A, 0A), we note that f ⊕<sup>A</sup> g = f +ε(g) and f +<sup>A</sup> g = f + g, and so in particular +<sup>A</sup> = +. Therefore, from now on we will omit the subscripts and simply write ⊕ and +.

For every Cartesian left additive category, there are always at least two possible infinitesimal extensions:

**Lemma 4.** *For any Cartesian left additive category* X*,*


We note that while these examples of infinitesimal extensions may seem trivial, they are both very important as they will give rise to key examples of Cartesian difference categories.

#### **4.2 Cartesian Difference Categories**

**Definition 8.** *A Cartesian difference category is a Cartesian left additive category with an infinitesimal extension* ε *which is equipped with a difference combinator ∂ of the form:*

$$\frac{f:A \to B}{\partial[f]:A \times A \to B}$$

*verifying the following coherence conditions:*

**[C***∂***.0]** f ◦ (x + ε(y)) = f ◦ x + ε (*∂*[f] ◦ x, y) **[C***∂***.1]** *∂*[f + g] = *∂*[f] + *∂*[g]*, ∂*[0] = 0*, and ∂*[ε(f)] = ε(*∂*[f]) **[C***∂***.2]** *∂*[f] ◦ x, y + z = *∂*[f] ◦ x, y + *∂*[f] ◦ x + ε(y), z *and ∂*[f] ◦ x, 0 = 0 **[C***∂***.3]** *∂*[1A] = π<sup>1</sup> *and ∂*[π0] = π1; π<sup>0</sup> *and ∂*[π1] = π1; π<sup>0</sup> **[C***∂***.4]** *∂*[f,g] = *∂*[f], *∂*[g] *and ∂*[!A] =!<sup>A</sup>×<sup>A</sup> **[C***∂***.5]** *∂*[g ◦ f] = *∂*[g] ◦ f ◦ π0, *∂*[f] **[C***∂***.6]** *∂* [*∂*[f]] ◦ x, y,0, z = *∂*[f] ◦ x + ε(y), z **[C***∂***.7]** *∂* [*∂*[f]] ◦ x, y,z, 0 = *∂* [*∂*[f]] ◦ x, z,y, 0

Before giving some intuition on the axioms **[C***∂***.0]** to **[C***∂***.7]**, we first observe that one could have used change action notation to express **[C***∂***.0]**, **[C***∂***.2]**, and **[C***∂***.6]** which would then be written as:

**[C***∂***.0]** f ◦ (x ⊕ y)=(f ◦ x) ⊕ (*∂*[f] ◦ x, y) **[C***∂***.2]** *∂*[f] ◦ x, y + z = *∂*[f] ◦ x, y + *∂*[f] ◦ x ⊕ y, z and *∂*[f] ◦ x, 0 = 0 **[C***∂***.6]** *∂* [*∂*[f]] ◦ x, y,0, z = *∂*[f] ◦ x ⊕ y, z

And also, just like Cartesian differential categories, **[C***∂***.6]** and **[C***∂***.7]** have alternative equivalent expressions.

**Lemma 5.** *In the presence of the other axioms,* **[C***∂***.6]** *and* **[C***∂***.7]** *are equivalent to:*

**[C***∂***.6.a]** *∂* [*∂*[f]] ◦ x, 0,0, y = *∂*[f] ◦ x, y **[C***∂***.7.a]** *∂* [*∂*[f]] ◦ x, y,z, w = *∂* [*∂*[f]] ◦ x, z,y, w

*Proof.* The proof is essentially the same as [7, Proposition 4.2]. -

The keen eyed reader will notice that the axioms of a Cartesian difference category are very similar to the axioms of a Cartesian differential category. Indeed, **[C***∂***.1]**, **[C***∂***.3]**, **[C***∂***.4]**, **[C***∂***.5]**, and **[C***∂***.7]** are the same as their Cartesian differential category counterpart. The axioms which are different are **[C***∂***.2]** and **[C***∂***.6]** where the infinitesimal extension ε is now included, and also there is the new extra axiom **[C***∂***.0]**. On the other hand, interestingly enough, **[C***∂***.6.a]** is the same as **[CD.6.a]**. We also point out that writing out **[C***∂***.0]** and **[C***∂***.2]** using change action notion, we see that these axioms are precisely **[CAD.1]** and **[CAD.2]** respectively. To better understand **[C***∂***.0]** to **[C***∂***.7]** it may be useful to write them out using element-like notation. In element-like notation, **[C***∂***.0]** is written as:

$$f(x + \varepsilon(y)) = f(x) + \varepsilon\left(\partial[f](x, y)\right)$$

This condition can be read as a generalization of the Kock-Lawvere axiom that characterizes the derivative in from synthetic differential geometry [13]. Broadly speaking, the Kock-Lawvere axiom states that, for any map f : R→R and any x ∈ R and d ∈ D, there exists a unique f- (x) ∈ R verifying

$$f(x+d) = f(x) + d \cdot f'(x)$$

where D is the subset of R consisting of infinitesimal elements. It is by analogy with the Kock-Lawvere axiom that we refer to ε as an "infinitesimal extension" as it can be thought of as embedding the space A into a subspace ε(A) of infinitesimal elements.

**[C***∂***.1]** states that the differential of a sum of maps is the sum of differentials, and similarly for zero maps and the infinitesimal extension of a map. **[C***∂***.2]** is the first crucial difference between a Cartesian difference category and a Cartesian differential category. In a Cartesian differential category, the differential of a map is assumed to be additive in its second argument. In a Cartesian difference category, just as derivatives for change actions, while the differential is still required to preserve zeros in its second argument, it is only additive "up to a small perturbation", that is:

$$
\partial[f](x, y+z) = \partial[f](x, y) + \partial[f](x + \varepsilon(y), z),
$$

**[C***∂***.3]** tells us what the differential of the identity and projection maps are, while **[C***∂***.4]** says that the differential of a pairing of maps is the pairing of their differentials. **[C***∂***.5]** is the chain rule which expresses what the differential of a composition of maps is:

$$
\partial[g \circ f](x, y) = \partial[g](f(x), \partial[f](x, y))
$$

**[C***∂***.6]** and **[C***∂***.7]** tell us how to work with second order differentials. **[C***∂***.6]** is expressed as follows:

$$
\partial \left[ \partial [f] \right](x, y, 0, z) = \partial [f](x + \varepsilon(y), z),
$$

and finally **[C***∂***.7]** is expressed as:

$$
\partial \left[ \partial [f] \right](x, y, z, 0) = \partial \left[ \partial [f] \right](x, z, y, 0)
$$

It is interesting to note that while **[C***∂***.6]** is different from **[CD.6]**, its alternative version **[C***∂***.6.a]** is the same as **[CD.6.a]**.

$$
\partial \left[ \partial [f] \right]((x,0),(0,y)) = \partial [f](x,z).
$$

#### **4.3 Another look at Cartesian Differential Categories**

Here we explain how a Cartesian differential category is a Cartesian difference category where the infinitesimal extension is given by zero.

**Proposition 1.** *Every Cartesian differential category* X *with differential combinator* D *is a Cartesian difference category where the infinitesimal extension is defined as* ε(f)=0 *and the difference combinator is defined to be the differential combinator, ∂* = D*.*

*Proof.* As noted before, the first two parts of the **[C***∂***.1]**, the second part of **[C***∂***.2]**, **[C***∂***.3]**, **[C***∂***.4]**, **[C***∂***.5]**, and **[C***∂***.7]** are precisely the same as their Cartesian differential axiom counterparts. On the other hand, since ε(f)=0, **[C***∂***.0]** and the third part of **[C***∂***.1]** trivial state that 0 = 0, while the first part of **[C***∂***.2]** and **[C***∂***.6]** end up being precisely the first part of **[CD.2]** and **[CD.6]**. Therefore, the differential combinator satisfies the Cartesian difference axioms and we conclude that a Cartesian differential category is a Cartesian difference category. -

Conversely, one can always build a Cartesian differential category from a Cartesian difference category by considering the objects for which the infinitesimal extension is the zero map.

**Proposition 2.** *For a Cartesian difference category* X *with infinitesimal extension* ε *and difference combinator ∂, then* X0*, the full subcategory of objects* A *such that* ε(1A)=0*, is a Cartesian differential category where the differential combinator is defined to be the difference combinator,* D = *∂.*

*Proof.* First note that if ε(1A) = 0 and ε(1B) = 0, then by definition it also follows that ε(1<sup>A</sup>×<sup>B</sup>) = 0, and also that for the terminal object ε(1)=0 by uniqueness of maps into the terminal object. Thus X<sup>0</sup> is closed under finite products and is therefore a Cartesian left additive category. Furthermore, we again note that since ε(f) = 0, this implies that for maps between such objects the Cartesian difference axioms are precisely the Cartesian differential axioms. Therefore, the difference combinator is a differential combinator for this subcategory, and so X<sup>0</sup> is a Cartesian differential category. -

In any Cartesian difference category <sup>X</sup>, the terminal object always satisfies that <sup>ε</sup>(1) = 0, and so therefore, <sup>X</sup><sup>0</sup> is never empty. On the other hand, applying Proposition 2 to a Cartesian differential category results in the entire category. It is also important to note that the above two propositions do not imply that if a difference combinator is a differential combinator then the infinitesimal extension must be zero. In Section 5.3, we provide such an example of a Cartesian differential category that comes equipped with a non-zero infinitesimal extension such that the differential combinator is a difference combinator with respect to this non-zero infinitesimal extension.

### **4.4 Cartesian Difference Categories as Change Action Models**

In this section, we show how every Cartesian difference category is a particularly well-behaved change action model, and conversely how every change action model contains a Cartesian difference category.

**Proposition 3.** *Let* X *be a Cartesian difference category with infinitesimal extension* <sup>ε</sup> *and difference combinator <sup>∂</sup>. Define the functor* <sup>α</sup> : <sup>X</sup> <sup>→</sup> CAct(X) *as* α(A)=(A, A, ⊕A, +A, 0A) *(as defined in Lemma 3) and* α(f)=(f, *∂*[f])*. Then* (X, α : <sup>X</sup> <sup>→</sup> CAct(X)) *is a change action model.*

*Proof.* By Lemma 3, (A, A, ⊕A, +A, 0A) is a change action and so α is welldefined on objects. While for a map f, *∂*[f] is a derivative of f in the change action sense since **[C***∂***.0]** and **[C***∂***.2]** are precisely **[CAD.1]** and **[CAD.2]**, and so α is well-defined on maps. That α preserves identities and composition follows from **[C***∂***.3]** and **[C***∂***.5]** respectively, and so α is a functor. That α preserves finite products will follow from **[C***∂***.3]** and **[C***∂***.4]**. Lastly, it is clear that α section of the forgetful functor, and therefore we conclude that (X, α) is a change action model. -

It is clear that not every change action model is a Cartesian difference category. For example, change action models do not require the addition to be commutative. On the other hand, it can be shown that every change action model contains a Cartesian difference category as a full subcategory.

**Definition 9.** *Let* (X, α : <sup>X</sup> <sup>→</sup> CAct(X)) *be a change action model. An object* <sup>A</sup> *is flat whenever the following hold:*

$$\begin{array}{l} \textbf{[F.1]} \quad \Delta A = A\\ \textbf{[F.2]} \quad \alpha(\oplus\_A) = (\oplus\_A, \oplus\_A \circ \pi\_1)\\ \textbf{[F.3]} \quad 0 \oplus\_A (0 \oplus\_A f) = 0 \oplus\_A f \text{ for any } f: U \to A. \\\ \textbf{[F.4]} \quad \oplus\_A \text{ is } right-injective, \text{ that is, } if \oplus\_A \circ \langle f, g \rangle = \oplus\_A \circ \langle f, h \rangle \text{ then } g = h. \end{array}$$

We would like to show that for any change action model (X, α), its full subcategory of flat objects, Flat<sup>α</sup> is a Cartesian difference category. Starting with the finite product structure, since α preserves finite products, it is straightforward to see that is Euclidean and if A and B are flat then so is A × B. The sum of maps f : A → B and g : A → B in Flat<sup>α</sup> is defined using the change action structure f +<sup>B</sup> g, while the zero map 0 : A → B is 0 = 0B◦!A. And so we obtain that:

**Lemma 6.** Flat<sup>α</sup> *is a Cartesian left additive category.*

*Proof.* Most of the Cartesian left additive structure is straightforward. However, since the addition is not required to be commutative for arbitrary change actions, we will show that the addition is commutative for Euclidean objects. Using that ⊕<sup>B</sup> is an action, that by **[F.2]** we have that ⊕<sup>B</sup> ◦ π<sup>1</sup> is a derivative for ⊕B, and **[CAD.1]**, we obtain that:

$$0\_B \oplus\_B (f +\_B g) = (0\_B \oplus\_B f) \oplus\_B g = (0\_B \oplus\_B g) \oplus\_B f = 0\_B \oplus\_B (g +\_B f)$$

By [**F.4]**, <sup>⊕</sup><sup>B</sup> is right-injective and we conclude that <sup>f</sup> <sup>+</sup> <sup>g</sup> <sup>=</sup> <sup>g</sup> <sup>+</sup> <sup>f</sup>. -

As an immediate consequence We note that for any change action model (X, α), since the terminal object is always flat, Flat<sup>α</sup> is never empty.

We use the action of the change action structure to define the infinitesimal extension. So for a map f : A → B in Flatα, define ε(f) : A → B as follows:

$$\varepsilon(f) = \oplus\_B \circ \langle 0\_B \circ !\_A, f \rangle = 0 \oplus\_B f$$

**Lemma 7.** ε *is an infinitesimal extension for* Flatα*.*

*Proof.* We show that ε preserve the addition. Following the same idea as in the proof of Lemma 6, we obtain the following:

$$\begin{aligned} 0\_B \oplus\_B \varepsilon(f +\_B g) &= 0\_B \oplus\_B \left( 0\_B \oplus\_B (f +\_B g) \right) \\ 0 &= \left( 0\_B \oplus\_B 0\_B \right) \oplus\_B \left( \left( 0\_B \oplus\_B f \right) \oplus\_B g \right) = \left( 0\_B \oplus\_B \left( 0\_B \oplus\_B f \right) \right) \oplus\_B \left( 0\_B \oplus\_B g \right) \\ 0 &= \left( 0\_B \oplus\_B \varepsilon(f) \right) \oplus\_B \varepsilon(g) = 0\_B \oplus\_B \left( \varepsilon(f) +\_B \varepsilon(g) \right) \end{aligned}$$

Then by [**F.3]**, it follows that ε(f +g) = ε(f)+ε(g). The remaining infinitesimal extension axioms are proven in a similar fashion. -

Lastly, the difference combinator for Flat<sup>α</sup> is defined in the obvious way, that is, *∂*[f] is defined as the second component of α(f).

**Proposition 4.** *Let* (X, α : <sup>X</sup> <sup>→</sup> CAct(X)) *be a change action model. Then* Flat<sup>α</sup> *is a Cartesian difference category.*

*Proof (Sketch).* The full calculations will appear in an upcoming extended journal version of this paper, but we give an informal explanation. [**C***∂***.0]** and [**C***∂***.2]** are a straightforward consequences of **[CAD.1]** and **[CAD.2]**. [**C***∂***.3]** and [**C***∂***.4]** follow trivially from the fact that α preserves finite products and from the structure of products in CAct(X), while [**C***∂***.5]** follows from composition in CAct(X). [**C***∂***.1]**, [**C***∂***.6]** and [**C***∂***.7]** are obtained by mechanical calculation in the spirit of Lemma 6. Note that every axiom except for [**C***∂***.6]** can be proven without using [**F.3]** -

#### **4.5 Linear Maps and** *ε***-Linear Maps**

An important subclass of maps in a Cartesian differential category is the subclass of *linear maps* [4, Definition 2.2.1]. One can also define linear maps in a Cartesian difference category by using the same definition.

**Definition 10.** *In a Cartesian difference category, a map* f *is linear if the following equality holds: ∂*[f] = f ◦ π1*.*

Using element-like notation, a map f is linear if *∂*[f](x, y) = f(y). Linear maps in a Cartesian difference category satisfy many of the same properties found in [4, Lemma 2.2.2].

**Lemma 8.** *In a Cartesian difference category,*


Using element-like notation, the first point of the above lemma says that if f is linear then f(ε(x)) = ε(f(x)). And while all linear maps are additive, the converse is not necessarily true, see [4, Corollary 2.3.4]. However, an immediate consequence of the above lemma is that the subcategory of linear maps of a Cartesian difference category has finite biproducts.

Another interesting subclass of maps is the subclass of ε-linear maps, which are maps whose infinitesimal extension is linear.

**Definition 11.** *In a Cartesian difference category, a map* f *is* ε*-linear if* ε(f) *is linear.*

**Lemma 9.** *In a Cartesian difference category,*


Using element-like notation, the first point of the above lemma says that if f is ε-linear then f(x + ε(y)) = f(x) + ε(f(y)). So ε-linear maps are additive on "infinitesimal" elements (i.e. those of the form ε(y)).

For a Cartesian differential category, linear maps in the Cartesian difference category sense are precisely the same as the Cartesian differential category sense [4, Definition 2.2.1], while every map is ε-linear since ε = 0.

## **5 Examples of Cartesian Difference Categories**

#### **5.1 Smooth Functions**

Every Cartesian differential category is a Cartesian difference category where the infinitesimal extension is zero. As a particular example, we consider the category of real smooth functions, which as mentioned above, can be considered to be the canonical (and motivating) example of a Cartesian differential category.

Let R be the set of real numbers and let SMOOTH be the category whose objects are Euclidean spaces <sup>R</sup><sup>n</sup> (including the point <sup>R</sup><sup>0</sup> <sup>=</sup> {∗}), and whose maps are smooth functions <sup>F</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup><sup>m</sup>. SMOOTH is a Cartesian left additive category where the product structure is given by the standard Cartesian product of Euclidean spaces and where the additive structure is defined by point-wise addition, (<sup>F</sup> <sup>+</sup> <sup>G</sup>)(*x*) = <sup>F</sup>(*x*) + <sup>G</sup>(*x*) and 0(*x*) = (0,..., 0), where *<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup>. SMOOTH is a Cartesian differential category where the differential combinator is defined by the directional derivative of smooth functions. Explicitly, for a smooth function <sup>F</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup><sup>m</sup>, which is in fact a tuple of smooth functions <sup>F</sup> = (f1,...,fn) where <sup>f</sup><sup>i</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>, <sup>D</sup>[F] : <sup>R</sup><sup>n</sup> <sup>×</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup><sup>m</sup> is defined as follows:

$$\mathbb{D}[F]\left(x,y\right) := \left(\sum\_{i=1}^n \frac{\partial f\_1}{\partial u\_i}(x)y\_i, \dots, \sum\_{i=1}^n \frac{\partial f\_n}{\partial u\_i}(x)y\_i\right)$$

where *<sup>x</sup>* = (x1,...,xn), *<sup>y</sup>* = (y1,...,yn) <sup>∈</sup> <sup>R</sup><sup>n</sup>. Alternatively, <sup>D</sup>[F] can also be defined in terms of the Jacobian matrix of F. Therefore SMOOTH is a Cartesian difference category with infinitesimal extesion ε = 0 and with difference combinator D. Since ε = 0, the induced action is simply *x* ⊕<sup>R</sup><sup>n</sup> *y* = *x*. Also a smooth function is linear in the Cartesian difference category sense precisely if it is R-linear in the classical sense, and every smooth function is ε-linear.

#### **5.2 Calculus of Finite Differences**

Here we explain how the difference operator from the calculus of finite differences gives an example of a Cartesian difference category but *not* a Cartesian differential category. This example was the main motivating example for developing Cartesian difference categories. The calculus of finite differences is captured by the category of abelian groups and arbitrary set functions between them.

Let Ab be the category whose objects are abelian groups G (where we use additive notation for group structure) and where a map f : G → H is simply an arbitrary function between them (and therefore does not necessarily preserve the group structure). Ab is a Cartesian left additive category where the product structure is given by the standard Cartesian product of sets and where the additive structure is again given by point-wise addition, (f +g)(x) = f(x)+g(x) and 0(x) = 0. Ab is a Cartesian difference category where the infinitesimal extension is simply given by the identity, that is, ε(f) = f, and and where the difference combinator *∂* is defined as follows for a map f : G → H:

$$
\partial[f](x,y) = f(x+y) - f(x)
$$

On the other hand, *∂* is not a differential combinator for Ab since it does not satisfy **[CD.6]** and part of **[CD.2]**. Thanks to the addition of the infinitesimal extension, *∂* does satisfy **[C***∂***.2]** and **[C***∂***.6]**, as well as **[C***∂***.0]**. However, as noted in [5], it is interesting to note that this *∂* does satisfy **[CD.1]**, the second part of **[CD.2]**, **[CD.3]**, **[CD.4]**, **[CD.5]**, **[CD.7]**, and **[CD.6.a]**. It is worth noting that in [5], the goal was to drop the addition and develop a "non-additive" version of Cartesian differential categories.

In Ab, since the infinitesimal operator is given by the identity, the induced action is simply the addition, x⊕<sup>G</sup> y = x+y. On the other hand, the linear maps in Ab are precisely the group homomorphisms. Indeed, f is linear if *∂*[f](x, y) = f(y). But by **[C***∂***.0]** and **[C***∂***.2]**, we get that:

$$f(x+y) = f(x) + \mathcal{O}[f](x,y) = f(x) + f(y) \qquad \qquad f(0) = \mathcal{O}[f](x,0) = 0$$

So f is a group homomorphism. Conversely, if f is a group homomorphism:

$$
\partial[f](x,y) = f(x+y) - f(x) = f(x) + f(y) - f(x) = f(y)
$$

So f is linear. Since ε(f) = f, the ε-linear maps are precisely the linear maps.

#### **5.3 Module Morphisms**

Here we provide a simple example of a Cartesian difference category whose difference combinator is also a differential combinator, but where the infinitesimal extension is neither zero nor the identity.

Let R be a commutative semiring and let MOD<sup>R</sup> be the category of Rmodules and R-linear maps between them. MOD<sup>R</sup> has finite biproducts and is therefore a Cartesian left additive category where every map is additive. Every <sup>r</sup> <sup>∈</sup> <sup>R</sup> induces an infinitesimal extension <sup>ε</sup><sup>r</sup> defined by scalar multiplication, ε<sup>r</sup>(f)(m) = rf(m). Then MOD<sup>R</sup> is a Cartesian difference category with the infinitesimal extension <sup>ε</sup><sup>r</sup> for any <sup>r</sup> <sup>∈</sup> <sup>R</sup> and difference combinator *<sup>∂</sup>* defined as:

$$
\partial [f](m, n) = f(n).
$$

R-linearity of f assures that **[C***∂***.0]** holds, while the remaining Cartesian difference axioms hold trivially. In fact, *∂* is also a differential combinator and therefore MOD<sup>R</sup> is also a Cartesian differential category. The induced action is given by m ⊕<sup>M</sup> n = m + rn. By definition of *∂*, every map in MOD<sup>R</sup> is linear, and by definition of ε<sup>r</sup> and R-linearity, every map is also ε-linear.

#### **5.4 Stream calculus**

Here we show how one can extend the calculus of finite differences example to stream calculus. The differential calculus of causal functions and interesting applications have recently been studying in [17, 18].

For a set A, let A<sup>ω</sup> denote the set of infinite sequences of elements of A, where we write [ai] for the infinite sequence [ai]=(a1, a2, a3,...) and ai:<sup>j</sup> for the (finite) subsequence (ai, ai+1,...,a<sup>j</sup> ). A function <sup>f</sup> : <sup>A</sup><sup>ω</sup> <sup>→</sup> <sup>B</sup><sup>ω</sup> is **causal** whenever the n-th element f ([ai])<sup>n</sup> of the output sequence only depends on the first n elements of [ai], that is, f is causal if and only if whenever a0:<sup>n</sup> = b0:<sup>n</sup> then f ([ai])0:<sup>n</sup> = f ([bi])0:n. We now consider streams over abelian groups, so let Ab<sup>ω</sup> be the category whose objects are all the Abelian groups and whose morphisms are causal maps from G<sup>ω</sup> to Hω. Ab<sup>ω</sup> is a Cartesian left-additive category, where the product is given by the standard product of abelian groups and where the additive structure is lifted point-wise from the structure of Ab, that is, (f + g) ([ai])<sup>n</sup> = f ([ai])<sup>n</sup> + g ([ai])<sup>n</sup> and 0 ([ai])<sup>n</sup> = 0. In order to define the infinitesimal extension, we first need to define the truncation operator **z**. So let <sup>G</sup> be an abelian group and [ai] <sup>∈</sup> <sup>G</sup><sup>ω</sup>, then define the sequence **<sup>z</sup>**([ai]) as:

$$\mathbf{z}([a\_i])\_0 = 0 \qquad\qquad\qquad \mathbf{z}\left([a\_i]\right)\_{n+1} = a\_{n+1}$$

The category Ab<sup>ω</sup> is a Cartesian difference category where the infinitesimal extension is given by the truncation operator, ε(f) ([ai]) = **z** (f ([ai])),

and where the difference combinator *∂* is defined as follows:

$$\begin{aligned} \partial[f]\left(\left[a\_i\right], \left[b\_i\right]\right)\_0 &= f\left(\left[a\_i\right] + \left[b\_i\right]\right)\_0 - f\left(\left[a\_i\right]\right)\_0\\ \partial[f]\left(\left[a\_i\right], \left[b\_i\right]\right)\_{n+1} &= f\left(\left[a\_i\right] + \mathbf{z}\left(\left[b\_i\right]\right)\right)\_{n+1} - f\left(\left[a\_i\right]\right)\_{n+1} \end{aligned}$$

Note the similarities between the difference combinator on Ab and that on Ab<sup>ω</sup> . The induced action is computed out to be:

$$([a\_i] \oplus [b\_i])\_0 = a\_0 \qquad \qquad ([a\_i] \oplus [b\_i])\_{n+1} = a\_{n+1} + b\_{n+1}$$

A causal map is linear (in the Cartesian difference category sense) if and only if it is a group homomorphism. While a causal map f is ε-linear if and only if it is a group homomorphism which does not the depend on the 0-th term of its input, that is, f ([ai]) = f (**z**([ai])).

## **6 Tangent Bundles in Cartesian Difference Categories**

In this section, we show that the difference combinator of a Cartesian difference category induces a monad, called the *tangent monad*, whose Kleisli category is again a Cartesian difference category. This construction is a generalization of the tangent monad for Cartesian differential categories [7, 15]. However, the Kleisli category of the tangent monad of a Cartesian differential category is *not* a Cartesian differential category, but rather a Cartesian difference category.

#### **6.1 The Tangent Bundle Monad**

Let X be a Cartesian difference category with infinitesimal extension ε and difference combinator *<sup>∂</sup>*. Define the functor <sup>T</sup> : <sup>X</sup> <sup>→</sup> <sup>X</sup> as follows:

$$\mathsf{T}(A) = A \times A \qquad \qquad \mathsf{T}(f) = \langle f \circ \pi\_0, \partial[f] \rangle\_0$$

and define the natural transformations <sup>η</sup> : <sup>1</sup><sup>X</sup> <sup>⇒</sup> <sup>T</sup> and <sup>μ</sup> : <sup>T</sup><sup>2</sup> <sup>⇒</sup> <sup>T</sup> as follows:

$$\eta\_A := \langle 1\_A, 0 \rangle \qquad \mu\_A := \langle \pi\_0 \circ \pi\_0, \pi\_1 \circ \pi\_0 + \pi\_0 \circ \pi\_1 + \varepsilon(\pi\_1 \circ \pi\_1) \rangle$$

**Proposition 5.** (T, μ, η) *is a monad.*

*Proof.* Functoriality of T will follow from **[C***∂***.3]** and the chain rule **[C***∂***.5]**. Naturality of η and μ and the monad identities will follow from the remaining difference combinator axioms. The full lengthy brute force calculations will appear in an upcoming extended journal version of this paper. -

When X is a Cartesian differential category with the difference structure arising from setting ε = 0, this tangent bundle monad coincides with the standard tangent monad corresponding to its tangent category structure [7, 15].

### **6.2 The Kleisli Category of T**

Recall that the Kleisli category of the monad (T, μ, η) is defined as the category <sup>X</sup><sup>T</sup> whose objects are the objects of <sup>X</sup>, and where a map <sup>A</sup> <sup>→</sup> <sup>B</sup> in <sup>X</sup><sup>T</sup> is a map <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>T</sup>(B) in <sup>X</sup>, which would be a pair <sup>f</sup> <sup>=</sup> f0, f1 where <sup>f</sup><sup>j</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup>. The identity map in <sup>X</sup><sup>T</sup> is the monad unit <sup>η</sup><sup>A</sup> : <sup>A</sup> <sup>→</sup> <sup>T</sup>(A), while composition of Kleisli maps f : A → T(B) and g : B → T(C) is defined as the composite <sup>μ</sup><sup>C</sup> ◦T(g)◦f. To distinguish between composition in <sup>X</sup> and <sup>X</sup>T, we denote Kleisli composition as <sup>g</sup> ◦<sup>T</sup> <sup>f</sup> <sup>=</sup> <sup>μ</sup><sup>C</sup> ◦T(g) ◦ <sup>f</sup>. If <sup>f</sup> <sup>=</sup> f0, f1 and <sup>g</sup> <sup>=</sup> g0, g1, then their Kleisli composition can be explicitly computed out to be:

$$g \circ^{\sf T} f = \langle g\_0, g\_1 \rangle \circ^{\sf T} \langle f\_0, f\_1 \rangle = \langle g\_0 \circ f\_0, \partial[g\_0] \circ \langle f\_0, f\_1 \rangle + g\_1 \circ (f\_0 + \varepsilon(f\_1)) \rangle$$

Kleisli maps can be understood as "generalized" vector fields. Indeed, T(A) should be thought of as the tangent bundle over A, and therefore a vector field would be a map 1, f : A → T(A), which is of course also a Kleisli map. For more details on the intuition behind this Kleisli category see [7]. We now wish to explain how the Kleisli category is again a Cartesian difference category.

We begin by exhibiting the Cartesian left additive structure of the Kleisli category. The product of objects in <sup>X</sup><sup>T</sup> is defined as <sup>A</sup> <sup>×</sup> <sup>B</sup> with projections π<sup>T</sup> <sup>0</sup> : <sup>A</sup> <sup>×</sup> <sup>B</sup> <sup>→</sup> <sup>T</sup>(A) and <sup>π</sup><sup>T</sup> <sup>1</sup> : <sup>A</sup> <sup>×</sup> <sup>B</sup> <sup>→</sup> <sup>T</sup>(B) defined respectively as <sup>π</sup><sup>T</sup> <sup>0</sup> = π0, 0 and π<sup>T</sup> <sup>1</sup> = π1, 0. The pairing of Kleisli maps f = f0, f1 and g = , g0, g1 is defined as f,g<sup>T</sup> <sup>=</sup> f0, g0,f1, g1. The terminal object is again and where the unique map to the terminal object is !<sup>T</sup> <sup>A</sup> = 0. The sum of Kleisli maps f Kleisli maps <sup>f</sup> <sup>=</sup> f0, f1 and <sup>g</sup> <sup>=</sup> , g0, g1 is defined as <sup>f</sup>+<sup>T</sup><sup>g</sup> <sup>=</sup> <sup>f</sup>+<sup>g</sup> <sup>=</sup> f0+g0, f1+g1, and the zero Kleisli maps is simply 0<sup>T</sup> =0= 0, <sup>0</sup>. Therefore we conclude that the Kleisli category of the tangent monad is a Cartesian left additive category.

**Lemma 10.** X<sup>T</sup> *is a Cartesian left additive category.*

The infinitesimal extension ε<sup>T</sup> for the Kleisli category is defined as follows for a Kleisli map f = f0, f1:

$$
\varepsilon^\top(f) = \langle 0, f\_0 + \varepsilon(f\_1) \rangle,
$$

## **Lemma 11.** ε<sup>T</sup> *is an infinitesimal extension on* XT*.*

It is interesting to point out that for an object <sup>A</sup> the induced action <sup>⊕</sup><sup>T</sup> <sup>A</sup> can be computed out to be:

$$\varepsilon \ominus\_A^\Upsilon = \pi\_0^\Upsilon + {}^\Upsilon \varepsilon^\Upsilon (\pi\_1) = \langle \pi\_0, 0 \rangle + \langle 0, \pi\_1 \rangle = \langle \pi\_0, \pi\_1 \rangle = 1\_{\Upsilon(A)}$$

and we stress that this is the identity of T(A) in the base category X (but not in the Kleisli category).

To define the difference combinator for the Kleisli category, first note that difference combinators by definition do not change the codomain. That is, if f : A → T(B) is a Kleisli arrow, then the type of its derivative *qua* Kleisli arrow should be A × A → T(B) × T(B), which coincides with the type of its derivative in X. Therefore, the difference combinator *∂*<sup>T</sup> for the Kleisli category can be defined to be the difference combinator of the base category, that is, for a Kleisli map f = f0, f1:

$$
\partial^\top[f] = \partial[f] = \langle \partial[f\_0], \partial[f\_1] \rangle,
$$

**Proposition 6.** *For a Cartesian difference category* X*, the Kleisli category* X<sup>T</sup> *is a Cartesian difference category with infinitesimal extension* ε<sup>T</sup> *and difference combinator ∂*<sup>T</sup>*.*

*Proof.* The full lengthy brute force calculations will appear in an upcoming extended journal version of this paper. We do note that a crucial identity for this proof is that for any map f in X, the following equality holds:

$$\mathsf{T}(\partial[f]) = \partial\left[\mathsf{T}(f)\right] \circ \langle \pi\_0 \times \pi\_0, \pi\_1 \times \pi\_1 \rangle$$

This helps simplify many of the calculations for the difference combinator axioms since T(*∂*[f]) appears everywhere due to the definition of Kleisli composition. -

As a result, the Kleisli category of a Cartesian difference category is again a Cartesian difference category, whose infinitesimal extension is neither the identity or the zero map. This allows one to build numerous examples of interesting and exotic Cartesian difference categories, such as the Kleisli category of Cartesian differential categories (or iterating this process, taking the Kleisli category of the Kleisli category). We highlight the importance of this construction in the Cartesian differential case as it does not in general result in a Cartesian differential category. Indeed, even if <sup>ε</sup> = 0, it is always the case that <sup>ε</sup><sup>T</sup> = 0. We conclude this section by taking a look at the linear maps and the ε<sup>T</sup>-linear maps in the Kleisli category. A Kleisli map f = f0, f1 is linear in the Kleisli category if *<sup>∂</sup>*<sup>T</sup>[f] = <sup>f</sup> ◦<sup>T</sup> <sup>π</sup><sup>T</sup> <sup>1</sup> , which amounts to requiring that:

$$\langle \partial[f\_0], \partial[f\_1] \rangle = \langle f\_0 \circ \pi\_1, f\_1 \circ \pi\_1 \rangle$$

Therefore a Kleisli map is linear in the Kleisli category if and only if it is the pairing of maps which are linear in the base category. On the other hand, f is <sup>ε</sup><sup>T</sup>-linear if <sup>ε</sup><sup>T</sup> (f) = 0, f<sup>0</sup> <sup>+</sup> <sup>ε</sup>(f1) is linear in the Kleisli category, which in this case amounts to requiring that f<sup>0</sup> + ε(f1) is linear. Therefore, if f<sup>0</sup> is linear and f<sup>1</sup> is ε-linear, then f is ε<sup>T</sup>-linear.

## **7 Conclusions and Future Work**

We have presented Cartesian difference categories, which generalize Cartesian differential categories to account for more discrete definitions of derivatives while providing an additional structure that is absent in change action models. We have also exhibited important examples and shown that Cartesian difference categories arise quite naturally from considering tangent bundles in any Cartesian differential category. We claim that Cartesian difference categories can facilitate the exploration of differentiation in discrete spaces, by generalizing techniques and ideas from the study of their differential counterparts. For example, Cartesian differential categories can be extended to allow objects whose tangent space is not necessarily isomorphic to the object itself [9]. The same generalization could be applied to Cartesian difference categories – with some caveats: for example, the equation defining a linear map (Definition 10) becomes ill-typed, but the notion of ε-linear map remains meaningful.

Another relevant path to consider is developing the analogue of the "tensor" story for Cartesian difference categories. Indeed, an important source of examples of Cartesian differential categories are the coKleisli categories of a tensor differential category [3, 4]. A similar result likely holds for a hypothetical "tensor difference category", but it is not clear how these should be defined: [**C***∂***.2]** implies that derivatives in the difference sense are non-linear and therefore their interplay with the tensor structure will be much different.

A further generalization of Cartesian differential categories, categories with tangent structure [7] are defined directly in terms of a tangent bundle functor rather than requiring that every tangent bundle be trivial (that is, in a tangent category it may not be the case that TA = A × A). Some preliminary research on change actions has already shown that, when generalized in this way, change actions are precisely internal categories, but the consequences of this for change action models (and, *a fortiori*, Cartesian difference categories) are not understood. More recently, some work has emerged about differential equations using the language of tangent categories [8]. We believe similar techniques can be applied in a straightforward way to Cartesian difference categories, where they might be of use to give an abstract formalization of discrete dynamical systems and difference equations.

An important open question is whether Cartesian difference categories (or a similar notion) admit an internal language. It is well-known that the differential λ-calculus can be interpreted in Cartesian closed differential categories [14]. Given their similarities, we believe there will be a very similar "difference λcalculus" which could potentially have applications to automatic differentiation (change structures, a notion similar to change actions, have already been proposed as models of forward-mode automatic differentiation [12], although work on the area seems to have stagnated).

Lastly, we should mention that there are adjunctions between the categories of Cartesian difference categories, change action models, and Cartesian differential categories given by Proposition 1, 2, 3, and 4. These adjunctions will be explored in detail in the upcoming journal version of this paper.

## **References**


which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. **Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Contextual Equivalence for Signal Flow Graphs**

Filippo Bonchi1, Robin Piedeleu2-, Pawel Soboci´nski3--, and Fabio Zanasi2-(-)

<sup>1</sup> Universit`a di Pisa, Italy

<sup>2</sup> University College London, UK, {r.piedeleu, f.zanasi}@ucl.ac.uk <sup>3</sup> Tallinn University of Technology, Estonia

**Abstract.** We extend the signal flow calculus—a compositional account of the classical signal flow graph model of computation—to encompass affine behaviour, and furnish it with a novel operational semantics. The increased expressive power allows us to define a canonical notion of contextual equivalence, which we show to coincide with denotational equality. Finally, we characterise the realisable fragment of the calculus: those terms that express the computations of (affine) signal flow graphs.

**Keywords:** signal flow graphs · affine relations · full abstraction · contextual equivalence · string diagrams

## **1 Introduction**

Compositional accounts of models of computation often lead one to consider *relational* models because a decomposition of an input-output system might consist of internal parts where flow and causality are not always easy to assign. These insights led Willems [33] to introduce a new current of control theory, called *behavioural* control: roughly speaking, behaviours and observations are of prime concern, notions such as state, inputs or outputs are secondary. Independently, programming language theory converged on similar ideas, with *contextual equivalence* [25,28] often considered as *the* equivalence: programs are judged to be different if we can find some context in which one behaves differently from the other, and what is observed about "behaviour" is often something quite canonical and simple, such as termination. Hoare [17] and Milner [23] discovered that these programming language theory innovations also bore fruit in the nondeterministic context of concurrency. Here again, research converged on studying simple and canonical contextual equivalences [24,18].

This paper brings together all of the above threads. The model of computation of interest for us is that of signal flow graphs [32,21], which are feedback systems well known in control theory [21] and widely used in the modelling of linear dynamical systems (in continuous time) and signal processing circuits (in

<sup>-</sup>Supported by EPSRC grant EP/R020604/1. -

<sup>-</sup> Supported by the ESF funded Estonian IT Academy research measure (project 2014- 2020.4.05.19-0001)

c The Author(s) 2020

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 77–96, 2020. https://doi.org/10.1007/978-3-030-45231-5\_5

discrete time). The *signal flow calculus* [10,9] is a syntactic presentation with an underlying compositional denotational semantics in terms of linear relations. Armed with *string diagrams* [31] as a syntax, the tools and concepts of programming language theory and concurrency theory can be put to work and the calculus can be equipped with a structural operational semantics. However, while in previous work [9] a connection was made between operational equivalence (essentially trace equivalence) and denotational equality, the signal flow calculus was not quite expressive enough for contextual equivalence to be a useful notion.

The crucial step turns out to be moving from *linear* relations to *affine* relations, i.e. linear subspaces translated by a vector. In recent work [6], we showed that they can be used to study important physical phenomena, such as current and voltage sources in electrical engineering, as well as fundamental synchronisation primitives in concurrency, such as mutual exclusion. Here we show that, in addition to yielding compelling mathematical domains, affinity proves to be the magic ingredient that ties the different components of the story of signal flow graphs together: it provides us with a canonical and simple notion of observation to use for the *definition* of contextual equivalence, and gives us the expressive power to prove a bona fide full abstraction result that relates contextual equivalence with denotational equality.

To obtain the above result, we extend the signal flow calculus to handle affine behaviour. While the denotational semantics and axiomatic theory appeared in [6], the operational account appears here for the first time and requires some technical innovations: instead of traces, we consider *trajectories*, which are infinite traces that may start in the past. To record the time, states of our transition system have a runtime environment that keeps track of the global clock.

Because the affine signal flow calculus is oblivious to flow directionality, some terms exhibit pathological operational behaviour. We illustrate these phenomena with several examples. Nevertheless, for the linear sub-calculus, it is known [9] that every term is denotationally equal to an executable realisation: one that is in a form where a consistent flow can be identified, like the classical notion of signal flow graph. We show that the question has a more subtle answer in the affine extension: not all terms are realisable as (affine) signal flow graphs. However, we are able to characterise the class of diagrams for which this is true.

*Related work.* Several authors studied signal flow graphs by exploiting concepts and techniques of programming language semantics, see e.g. [4,22,29,2]. The most relevant for this paper is [2], which, independently from [10], proposed the same syntax and axiomatisation for the ordinary signal flow calculus and shares with our contribution the same methodology: the use of *string diagrams* as a mathematical playground for the compositional study of different sorts of systems. The idea is common to diverse, cross-disciplinary research programmes, including Categorical Quantum Mechanics [1,11,12], Categorical Network Theory [3], Monoidal Computer [26,27] and the analysis of (a)synchronous circuits [14,15].

*Outline* In Section 2 we recall the affine signal flow calculus. Section 3 introduces the operational semantics for the calculus. Section 4 defines contextual equivalence and proves full abstraction. Section 5 introduces a well-behaved class of circuits, that denotes functional input-output systems, laying the groundwork for Section 6, in which the concept of realisability is introduced before a characterisation of which circuit diagrams are realisable. Missing proofs can be found in the extended version of this paper [7].

## **2 Background: the Affine Signal Flow Calculus**

The *Affine Signal Flow Calculus* extends the signal flow calculus [9] with an extra generator that allows to express affine relations. In this section, we first recall its syntax and denotational semantics from [6] and then we highlight two key properties for proving full abstraction that are enabled by the affine extension. The operational semantics is delayed to the next section.

$$
\begin{array}{ccccccccc}
\hline
\boxed{\cdots}: (1,2) & \xrightarrow{\cdots}: (1,0) & \xrightarrow{} \boxed{\cdots}: (1,1) & \xrightarrow{} \boxed{\cdots}: (1,1) & \xrightarrow{} \boxed{\cdots}: (2,1) & \xrightarrow{} \cdots : (0,1) & \xrightarrow{} \cdots : (0,1) \\
\hline
\boxed{\cdots}: (2,1) & \xrightarrow{} \boxed{\cdots}: (0,1) & \xrightarrow{} \boxed{\boxplus}: (1,1) & \xrightarrow{} \boxed{\cdots}: (1,2) & \xrightarrow{} \cdots : (0,1) \\
\hline
\cdots: (0,0) & \xrightarrow{} \cdots: (1,1) & \xrightarrow{} \boxed{\cdots}: (2,2) & \xrightarrow{} \cdots & \cdots & \cdots \\
\hline
\end{array}
$$

**Fig. 1.** Sort inference rules.

#### **2.1 Syntax**

$$c ::= \dashrightarrow | \dashrightarrow | \dashrightarrow \dashrightarrow | \dashrightarrow \dashrightarrow | \dashrightarrow \dashrightarrow | \dashrightarrow | \dashrightarrow | \dashrightarrow |}$$

$$\begin{array}{c} \mathsf{(}\mathsf{)} \end{array} \begin{array}{c} \mathsf{(}\mathsf{\(\mathsf{\(\mathsf{\(\begin{array}{@}}\mathsf{\(\mathsf{\(\begin{array}{@}}\mathsf{\(\mathsf{\(\begin{array}{@}}\mathsf{\(\mathsf{\(\begin{array}{@}}\mathsf{\(\mathsf{\(\begin{array}{@}}\mathsf{\(\cdot\}}\mathsf{\(\cdot\}}\end{array{\/ et\end{array}}\mathsf{\(\cdot\}}\mathsf{\(\cdot\}}\mathsf{)}\mathsf{)}\mathsf{)}\mathsf{)}\mathsf{)}\mathsf{)}} \end{array} \begin{array}{c} \mathsf{(}\mathsf{\(\cdot\}\mathsf{\(\cdot\}}\mathsf{)}\mathsf{)}\mathsf{)}\mathsf{)}} \end{array} \right. \end{array} \begin{array}{c} \mathsf{(}\mathsf{\(\cdot\}\mathsf{\(\cdot\}\mathsf{\(\cdot\}}\mathsf{)}\mathsf{)}\mathsf{)}\mathsf{)}\mathsf{)}\mathsf{)}\mathsf{)}$$

$$\dots \mid \lnot \mid \lnot c \mid c; c \tag{3}$$

The syntax of the calculus, generated by the grammar above, is parametrised over a given field k, with k ranging over k. We refer to the constants in rows (1)- (2) as *generators*. Terms are constructed from generators, , , , and the two binary operations in (3). We will only consider those terms that are *sortable*, i.e. they can be associated with a pair (n, m), with n, m ∈ **N**. Sortable terms are called *circuits*: intuitively, a circuit with sort (n, m) has n ports on the left and m on the right. The sorting discipline is given in Fig. 1. We delay discussion of computational intuitions to Section 3 but, for the time being, we observe that the generators of row (2) are those of row (1) "reflected about the y-axis".

#### **2.2 String Diagrams**

It is convenient to consider circuits as the arrows of a symmetric monoidal category ACirc (for Affine Circuits). Objects of ACirc are natural numbers (thus ACirc is a *prop* [19]) and morphisms n → m are the circuits of sort (n, m), quotiented by the laws of symmetric monoidal categories [20,31]4. The circuit grammar yields the symmetric monoidal structure of ACirc: sequential composition is given by c ; d, the monoidal product is given by c ⊕ d, and identities and symmetries are built by pasting together and in the obvious way. We will adopt the usual convention of writing morphisms of ACirc as *string diagrams*,

meaning that c ; c is drawn c c . - . . . . . . . . and c ⊕ c is drawn <sup>c</sup> c- . . . . . . . . . . . . . More suc-

cinctly, ACirc is the free prop on generators (1)-(2). The free prop on (1)-(2) sans and , hereafter called Circ, is the signal flow calculus from [9].

### **2.3 Denotational Semantics and Axiomatisation**

The semantics of circuits can be given denotationally by means of affine relations.

**Definition 1.** *Let* <sup>k</sup> *be a field. An affine subspace of* <sup>k</sup><sup>d</sup> *is a subset* <sup>V</sup> <sup>⊆</sup> <sup>k</sup><sup>d</sup> *that is either empty or for which there exists a vector* <sup>a</sup> <sup>∈</sup> <sup>k</sup><sup>d</sup> *and a linear subspace* <sup>L</sup> *of* <sup>k</sup><sup>d</sup> *such that* <sup>V</sup> <sup>=</sup> {<sup>a</sup> <sup>+</sup> <sup>v</sup> <sup>|</sup> <sup>v</sup> <sup>∈</sup> <sup>L</sup>}*. A* <sup>k</sup>-affine relation *of type* <sup>n</sup> <sup>→</sup> <sup>m</sup> *is an affine subspace of* <sup>k</sup><sup>n</sup> <sup>×</sup> <sup>k</sup><sup>m</sup>*, considered as a* <sup>k</sup>*-vector space.*

Note that every linear subspace is affine, taking a above to be the zero vector. Affine relations can be organised into a prop:

**Definition 2.** *Let* k *be a field. Let* ARel<sup>k</sup> *be the following prop:*


$$- \text{ monoidal product given by } G \oplus H = \left\{ \left( \begin{pmatrix} u \\ u' \end{pmatrix}, \begin{pmatrix} v \\ v' \end{pmatrix} \right) \mid (u, v) \in G, (u', v') \in H \right\}.$$

In order to give semantics to ACirc, we use the prop of affine relations over the field k(x) of fractions of polynomials in x with coefficients from k. Elements <sup>q</sup> <sup>∈</sup> <sup>k</sup>(x) are a fractions <sup>k</sup>0+k1·x1+k2·x2+···+kn·x<sup>n</sup> <sup>l</sup>0+l1·x1+l2·x2+···+lm·l<sup>m</sup> for some n, m <sup>∈</sup> **<sup>N</sup>** and <sup>k</sup>i, l<sup>i</sup> <sup>∈</sup> <sup>k</sup>. Sum, product, 0 and 1 in k(x) are defined as usual.

<sup>4</sup> This quotient is harmless: both the denotational semantics from [6] and the operational semantics we introduce in this paper satisfy those axioms on the nose.

**Definition 3.** *The prop morphism* [[·]]: ACirc → ARelk(x) *is inductively defined on circuits as follows. For the generators in* (1)

$$\begin{array}{ccc} \mathsf{\hspace{1cm}} \mathsf{\hspace{1cm}} \longmapsto \left\{ \left( p, \binom{p}{p} \right) \mid p \in \mathsf{k}(x) \right\} & \mathsf{\hspace{1cm}} \longmapsto \left\{ \left( \binom{p}{q}, p+q \right) \mid p, q \in \mathsf{k}(x) \right\} \\\\ \mathsf{\hspace{1cm}} \longmapsto \left\{ \left( p, \bullet \right) \mid p \in \mathsf{k}(x) \right\} & \mathsf{\hspace{0.1cm}} \longmapsto \left\{ \left( \bullet, 0 \right) \right\} & \longmapsto \left\{ \left( \bullet, 1 \right) \right\} \\\\ \mathsf{\hspace{1cm}} \longmapsto \left\{ \left( p, p \cdot r \right) \mid p \in \mathsf{k}(x) \right\} & \mathsf{\hspace{1cm}} \longmapsto \left\{ \left( p, p \cdot x \right) \mid p \in \mathsf{k}(x) \right\} \\\\ \mathsf{where} \ \mathsf{\hspace{1cm}} \text{ is the only element of } \mathsf{k}(x)^{\mathsf{0}} \quad \mathsf{The semantics of components in (2) is} \end{array}$$

*where* • *is the only element of* <sup>k</sup>(x)<sup>0</sup>*. The semantics of components in* (2) *is symmetric, e.g. is mapped to* {(p, •) | p ∈ k(x)}*. For* (3)

$$\begin{aligned} \left\{ \begin{array}{ccc} \dots & \{ (p,p) \mid p \in \mathsf{k}(x) \} & \mathsf{X} & \longmapsto & \left\{ \left( \begin{pmatrix} p\\q \end{pmatrix}, \begin{pmatrix} q\\p \end{pmatrix} \right) \mid p, q \in \mathsf{k}(x) \right\} \\\ \left\{ \begin{array}{ccc} \dots & \{ (\bullet,\bullet) \} & c\_1 \oplus c\_2 \longmapsto & \left\{ c\_1 \right\} \oplus \left\{ c\_2 \right\} & c\_1; c\_2 \longmapsto & \left\{ c\_1 \right\}; \left\{ c\_2 \right\} \end{array} \end{aligned} \right\} \end{aligned}$$

The reader can easily check that the pair of 1-dimensional vectors 1, <sup>1</sup> <sup>1</sup>−<sup>x</sup> ∈ <sup>k</sup>(x)<sup>1</sup> <sup>×</sup> <sup>k</sup>(x)<sup>1</sup> belongs to the denotation of the circuit in Example 1.

The denotational semantics enjoys a sound and complete axiomatisation. The axioms involve only basic interactions between the generators (1)-(2). The resulting theory is that of *Affine Interacting Hopf Algebras* (**aIH**).The generators in (1) form a Hopf algebra, those in (2) form another Hopf algebra, and the interaction of the two give rise to two Frobenius algebras. We refer the reader to [6] for the full set of equations and all further details.

**Proposition 1.** *For all* c, d *in* ACirc*,* [[c]] = [[d]] *if and only if* c **aIH** = d*.*

#### **2.4 Affine vs Linear Circuits**

It is important to highlight the differences between ACirc and Circ. The latter is the purely linear fragment: circuit diagrams of Circ denote exactly the *linear* relations over k(x) [8], while those of ACirc denote the *affine* relations over k(x).

The additional expressivity afforded by affine circuits is essential for our development. One crucial property is that every polynomial fraction can be expressed as an affine circuit of sort (0, 1).

**Lemma 1.** *For all* p ∈ k(x)*, there is* c<sup>p</sup> ∈ ACirc[0, 1] *with* [[cp]] = {(•, p)}*.*

*Proof.* For each p ∈ k(x), let P be the linear subspace generated by the pair of 1-dimensional vectors (1, p). By fullness of the denotational semantics of Circ [8], there exists a circuit c in Circ such that [[c]] = P. Then, [[ ; c]] = {(•, p)}.

The above observation yields the following:

**Proposition 2.** *Let* (u, v) <sup>∈</sup> <sup>k</sup>(x)<sup>n</sup> <sup>×</sup>k(x)<sup>m</sup>*. There exist circuits* <sup>c</sup><sup>u</sup> <sup>∈</sup> ACirc[0, n] *and* c<sup>v</sup> ∈ ACirc[m, 0] *such that* [[cu]] = {(•, u)} *and* [[cv]] = {(v, •)}*.*

*Proof.* Let u = <sup>p</sup><sup>1</sup> . . . pn and v = <sup>q</sup><sup>1</sup> . . . qm . By Lemma 1, for each pi, there exists a circuit cp<sup>i</sup> such that [[cp<sup>i</sup> ]] = {(•, pi)}. Let c<sup>u</sup> = cp<sup>1</sup> ⊕ ... ⊕ cp<sup>n</sup> . Then [[cu]] = {(•, u)}. For cv, it is enough to see that Proposition 1 also holds with 0 and 1 switched, then use the argument above.

Proposition 2 asserts that any behaviour (u, v) occurring in the denotation of some circuit c, i.e., such that (u, v) ∈ [[c]], can be expressed by a pair of circuits (cu, cv). We will, in due course, think of such a pair as a *context*, namely an environment with which a circuit can interact. Observe that this is not possible with the linear fragment Circ, since the only singleton linear subspace is 0.

Another difference between linear and affine concerns circuits of sort (0, 0). Indeed <sup>k</sup>(x)<sup>0</sup> <sup>=</sup> {•}, and the only linear relation over <sup>k</sup>(x)<sup>0</sup>×k(x)<sup>0</sup> is the singleton {(•, •)}, which is id<sup>0</sup> in ARelk(x). But there is another affine relation, namely the *empty relation* ∅ ∈ <sup>k</sup>(x)<sup>0</sup> <sup>×</sup> <sup>k</sup>(x)<sup>0</sup>. This can be represented by , for instance, since [[ ]] = {(•, 1)} ; {(0, •)} = ∅.

**Proposition 3.** *Let* c ∈ ACirc[0, 0]*. Then* [[c]] *is either* id<sup>0</sup> *or* ∅*.*

## **3 Operational Semantics for Affine Circuits**

Here we give the structural operational semantics of affine circuits, building on previous work [9] that considered only the core linear fragment, Circ. We consider circuits to be *programs* that have an observable behaviour. Observations are possible interactions at the circuit's interface. Since there are two interfaces: a left and a right, each transition has two labels.

In a transition tc <sup>v</sup> −→<sup>w</sup> t - c- , c and c are *states*, that is, circuits augmented with information about which values k ∈ k are stored in each register ( <sup>x</sup> and <sup>x</sup> ) at that instant of the computation. When transitioning to c- , the v above the arrow is a vector of values with which c synchronises on the left, and the w below the arrow accounts for the synchronisation on the right. States are decorated with runtime contexts: t and t are (possibly negative) integers that—intuitively—indicate the time when the transition happens. Indeed, in Fig. 2, every rule advances time by 1 unit. "Negative time" is important: as we shall see in Example 3, some executions must start in the past.

The rules in the top section of Fig. 2 provide the semantics for the generators in (1): is a *copier*, duplicating the signal arriving on the left; accepts any signal on the left and discards it, producing nothing on the right; is an *adder* that takes two signals on the left and emits their sum on the right, emits the constant 0 signal on the right; <sup>k</sup> is an *amplifier*, multiplying the signal on the left by the scalar k ∈ k. All the generators described so far are stateless. State is provided by <sup>x</sup> l which is a *register* ; a synchronous one place buffer with the value l stored. When it receives some value k on the left, it emits l on the right and stores k. The behaviour of the affine generator

t <sup>k</sup>−−→k k <sup>t</sup> + 1 t <sup>k</sup> −→• <sup>t</sup> + 1 - t k l −−→k+<sup>l</sup> <sup>t</sup> + 1 t - • −→<sup>0</sup> <sup>t</sup> + 1 - t x <sup>l</sup> <sup>k</sup> −→<sup>l</sup> <sup>t</sup> + 1 x k t <sup>r</sup> <sup>l</sup> −−→rl <sup>t</sup> + 1 r 0 - • −→<sup>1</sup> <sup>1</sup> t - • −→<sup>0</sup> <sup>t</sup> + 1 - (<sup>t</sup> = 0) t k k −−→<sup>k</sup> <sup>t</sup> + 1 t - • −→<sup>k</sup> <sup>t</sup> + 1 - t <sup>k</sup>+<sup>l</sup> −−−k l<sup>→</sup> <sup>t</sup> + 1 t - <sup>0</sup> −→• <sup>t</sup> + 1 - t x <sup>l</sup> <sup>l</sup> −→<sup>k</sup> <sup>t</sup> + 1 x k t <sup>r</sup> rl −−→<sup>l</sup> <sup>t</sup> + 1 r 0 - <sup>1</sup> −→• <sup>1</sup> t - <sup>0</sup> −→• <sup>t</sup> + 1 - (<sup>t</sup> = 0) t <sup>k</sup> −→<sup>k</sup> <sup>t</sup> + 1 t k l −−→l k <sup>t</sup> + 1 t - • −→• <sup>t</sup> + 1 - tc <sup>u</sup> −→<sup>v</sup> <sup>t</sup> + 1 c td <sup>v</sup> −→<sup>w</sup> <sup>t</sup> + 1 d tc ; d <sup>u</sup> −→<sup>w</sup> <sup>t</sup> + 1 c ; d t<sup>c</sup> <sup>u</sup><sup>1</sup> −−→<sup>v</sup><sup>1</sup> <sup>t</sup> + 1 c t<sup>d</sup> <sup>u</sup><sup>2</sup> −−→<sup>v</sup><sup>2</sup> <sup>t</sup> + 1 d t<sup>c</sup> <sup>⊕</sup> <sup>d</sup> <sup>u</sup><sup>1</sup> <sup>u</sup><sup>2</sup> −−−− <sup>v</sup><sup>1</sup> <sup>v</sup>→<sup>2</sup> <sup>t</sup> + 1 <sup>c</sup> <sup>⊕</sup> <sup>d</sup>

**Fig. 2.** Structural rules for operational semantics, with <sup>p</sup> <sup>∈</sup> **<sup>Z</sup>**, k, l ranging over <sup>k</sup> and u, v, w vectors of elements of k of the appropriate size. The only vector of k<sup>0</sup> is written as • (as in Definition 3), while a vector (k<sup>1</sup> ... kn) <sup>T</sup> <sup>∈</sup> <sup>k</sup><sup>n</sup> as <sup>k</sup><sup>1</sup> ...kn.

depends on the time: when t = 0, it emits 1, otherwise it emits 0. Observe that the behaviour of all other generators is time-independent.

So far, we described the behaviour of the components in (1) using the intuition that signal flows from left to right: in a transition <sup>v</sup> −→<sup>w</sup> , the signal v on the left is thought as trigger and w as effect. For the generators in (2), whose behaviour is defined by the rules in the second section of Fig. 2, the behaviour is symmetric—indeed, here it is helpful to think of signals as flowing from right to left. The next section of Fig. 2 specifies the behaviours of the structural connectors of (3): is a *twist*, swapping two signals, is the empty circuit and is the *identity* wire: the signals on the left and on the right ports are equal. Finally, the rule for sequential ; composition forces the two components to have the same value v on the shared interface, while for parallel ⊕ composition, components can proceed independently. Observe that both forms of composition require component transitions to happen at the same time.

**Definition 4.** *Let* c ∈ ACirc*. The* initial state c<sup>0</sup> *of* c *is the one where all the registers store* 0*. A* computation *of* c *starting at time* t ≤ 0 *is a (possibly infinite) sequence of transitions*

$$\begin{array}{c|c|c|c} t \rhd c\_0 & \xrightarrow[w\_t]{v\_t} & t+1 \rhd c\_1 & \xrightarrow[w\_{t+1}]{v\_{t+1}} & t+2 \rhd c\_2 & \xrightarrow[w\_{t+2}]{v\_{t+2}} \dots \end{array} \tag{4}$$

Since all transitions increment the time by 1, it suffices to record the time at which a computation starts. As a result, to simplify notation, we will omit the runtime context after the first transition and, instead of (4), write

$$t \rhd c\_0 \xrightarrow[w\_t]{v\_t} c\_1 \xrightarrow[w\_{t+1}]{v\_{t+1}} c\_2 \xrightarrow[w\_{t+2}]{v\_{t+2}} \dots \dots$$

*Example 2.* The circuit in Example 1 can perform the following computation.

In the example above, the flow has a clear left-to-right orientation, albeit with a feedback loop. For arbitrary circuits of ACirc this is not always the case, which sometimes results in unexpected operational behaviour.

*Example 3.* In <sup>x</sup> is not possible to identify a consistent flow: goes from left to right, while <sup>x</sup> from right to left. Observe that there is no computation starting at t = 0, since in the initial state the register contains 0 while must emit 1. There is, however, a (unique!) computation starting at time t = −1, that loads the register with 1 before can also emit 1 at time t = 0.

$$\begin{array}{c} \begin{array}{c} \cdot = 1 \ \Rightarrow \xleftarrow{0} \\ \cdot = \cdot \end{array} \mapsto \begin{array}{c} \cdot = \stackrel{\cdot}{\longrightarrow} \\ \cdot = \cdot \end{array} \mapsto \begin{array}{c} \stackrel{\cdot}{\longrightarrow} \\ \cdot \stackrel{\cdot}{\longrightarrow} \end{array} \mapsto \begin{array}{c} \stackrel{\cdot}{\longrightarrow} \\ \stackrel{\cdot}{\longrightarrow} \end{array} \mapsto \begin{array}{c} \stackrel{\cdot}{\longrightarrow} \\ \stackrel{\cdot}{\longrightarrow} \end{array} \mapsto \begin{array}{c} \stackrel{\cdot}{\longrightarrow} \\ \stackrel{\cdot}{\longrightarrow} \end{array} \dots \ldots \stackrel{\cdot}{\longrightarrow} \end{array}$$

Similarly, <sup>x</sup> <sup>x</sup> features a unique computation starting at time t = −2.

−2 x x 0 0 • −→<sup>1</sup> <sup>x</sup> <sup>x</sup> 0 1 • −→<sup>0</sup> <sup>x</sup> <sup>x</sup> 1 0 • −→<sup>0</sup> <sup>x</sup> <sup>x</sup> 0 0 • −→<sup>0</sup> ...

It is worthwhile clarifying the reason why, in the affine calculus, some computations start in the past. As we have already mentioned, in the linear fragment the semantics of all generators is time-independent. It follows easily that timeindependence is a property enjoyed by all purely linear circuits. The behaviour of , however, enforces a particular action to occur at time 0. Considering this in conjunction with a right-to-left register results in <sup>x</sup> , and the effect is to anticipate that action by one step to time -1, as shown in Example 3. It is obvious that this construction can be iterated, and it follows that the presence of a single time-dependent generator results in a calculus in which the computation of some terms must start at a finite, but unbounded time in the past.

*Example 4.* Another circuit with conflicting flow is . Here there is no possible transition at t = 0, since at that time must emit a 1 and can only synchronise on a 0. Instead, the circuit can always perform an infinite computation t - • −→• • −→• ... , for any t ≤ 0. Roughly speaking, the computations of these two (0, 0) circuits are operational mirror images of the two possible denotations of Proposition 3. This intuition will be made formal in Section 4. For now, it is worth observing that for all c, ⊕ c can perform the same computations of c, while ⊕ c cannot ever make a transition at time 0.

*Example 5.* Consider the circuit <sup>x</sup> <sup>x</sup> , which again features conflicting flow. Our equational theory equates it with , but the computations involved are subtly different. Indeed, for any sequence a<sup>i</sup> ∈ k, it is obvious that admits the computation

$$\begin{array}{ccccc}\hline\hline 0\rhd & \xrightarrow{a\_0} & \xrightarrow{a\_1} & \xrightarrow{a\_1} & \xrightarrow{a\_2} & \dots & & \text{(5)}\\\hline \end{array}$$

The circuit <sup>x</sup> <sup>x</sup> admits a similar computation, but we must begin at time t = −1 in order to first "load" the registers with a0:

$$\overbrace{1 \quad -1 \not\ni \quad \raisebox{1.0pt}{0.0pt} \quad \raisebox{1.0pt}{0.0pt} \quad \overset{0.0pt}{\cdots} \quad \overset{a\_0}{\longrightarrow} \overbrace{1 \quad \raisebox{1.0pt}{0.0pt} \quad a\_0}^{a\_0} \quad \overset{a\_1}{\longrightarrow} \overbrace{- \underset{a\_0}{\longrightarrow} \overbrace{1 \quad \raisebox{1.0pt}{0.0pt} \quad a\_1}^{a\_1} \quad \overset{a\_2}{\longrightarrow} \overbrace{- \underset{a\_1}{\longrightarrow} \overbrace{1 \quad \raisebox{1.0pt}{0.0pt} \quad a\_2}^{a\_2} \dots \dots \quad \overset{a\_2}{\longrightarrow} \overbrace{0 \quad a\_3}^{a\_3} \dots \overset{a\_3}{\longrightarrow} \overbrace{0 \quad a\_4}^{a\_4} \dots \overset{a\_4}{\longrightarrow} \overbrace{0 \quad a\_5}^{a\_5} \dots \overset{a\_5}{\longrightarrow} \overbrace{0 \quad a\_6}^{a\_6} \dots \overset{a\_6}{\longrightarrow} \overbrace{1 \dots}^{a\_{20}} \overbrace{0 \dots}^{a\_{20}} \dots \quad \overset{a\_2}{\longrightarrow} \overbrace{0 \dots}^{a\_{20}} \overbrace{1 \dots}^{a\_{20}} \dots \overset{a\_{20}}{\longrightarrow} \overbrace{0 \dots}^{a\_{20}} \overbrace{1 \dots}^{a\_{20}} \dots \overset{a\_{20}}{\longrightarrow} \overbrace{0 \dots}^{a\_{20}} \overbrace{1 \dots}^{a\_{20}} \dots \quad \overset{a\_{22}}{\longrightarrow} \overbrace{0 \dots}^{a\_{22}} \overbrace{1 \dots}^{a\_{22}} \dots \overbrace{0 \dots}^{a\_{22}} \overbrace{0 \dots}^{a\_{22}} \dots \overbrace{0 \dots}^{a\_{22}} \overbrace{1 \dots}^{a\_{2$$

The circuit <sup>x</sup> <sup>x</sup> , which again is equated with by the equational theory, is more tricky. Although every computation of can be reproduced, <sup>x</sup> <sup>x</sup> admits additional, problematic computations. Indeed, consider

$$\begin{array}{c} \begin{array}{c} 0 \ \mathbb{N} \end{array} \end{array} \xrightarrow{0} \begin{array}{c} \begin{array}{c} 0 \\ \end{array} \end{array} \xrightarrow{0} \begin{array}{c} \begin{array}{c} 0 \\ \end{array} \end{array} \xrightarrow{1} \begin{array}{c} \begin{array}{c} 0 \\ \end{array} \end{array} \begin{array}{c} \begin{array}{c} 0 \\ \end{array} \end{array} \tag{7}$$

at which point no further transition is possible—the circuit can deadlock.

The following lemma is an easy consequence of the rules of Fig. 2 and follows by structural induction. It states that all circuits can stay idle *in the past*.

**Lemma 2.** *Let* c ∈ ACirc[n, m] *with initial state* c0*. Then* t c<sup>0</sup> <sup>0</sup> −→<sup>0</sup> c<sup>0</sup> *if* t < 0*.*

#### **3.1 Trajectories**

For the non-affine version of the signal flow calculus, we studied in [9] *traces* arising from computations. For the affine extension, this is not possible since, as explained above, we must also consider computations that start in the past. In this paper, rather than traces we adopt a common control theoretic notion.

**Definition 5.** *An* (n, m)*-*trajectory <sup>σ</sup> *is a* **<sup>Z</sup>***-indexed sequence* <sup>σ</sup> : **<sup>Z</sup>** <sup>→</sup> <sup>k</sup><sup>n</sup> <sup>×</sup> <sup>k</sup><sup>m</sup> *that is finite in the past, i.e., for which* ∃j ∈ **Z** *such that* σ(i) = (0, 0) *for* i ≤ j*.*

By the universal property of the product we can identify <sup>σ</sup> : **<sup>Z</sup>** <sup>→</sup> <sup>k</sup><sup>n</sup> <sup>×</sup> <sup>k</sup><sup>m</sup> with the pairing <sup>σ</sup>l, σ<sup>r</sup> of <sup>σ</sup><sup>l</sup> : **<sup>Z</sup>** <sup>→</sup> <sup>k</sup><sup>n</sup> and <sup>σ</sup><sup>r</sup> : **<sup>Z</sup>** <sup>→</sup> <sup>k</sup><sup>m</sup>.A(k,m)-trajectory σ and (m, n)-trajectory τ are *compatible* if σ<sup>r</sup> = τl. In this case, we can define their composite, a (k, n)-trajectory σ ; τ by σ ; τ := σl, τ<sup>r</sup>. Given an (n1, m1) trajectory σ1, and an (n2, m2)-trajectory σ2, their product, an (n1+n2, m1+m2) trajectory <sup>σ</sup><sup>1</sup> <sup>⊕</sup>σ2, is defined (σ<sup>1</sup> <sup>⊕</sup>σ2)(i) := σ(i) τ (i) . Using these two operations we can organise *sets* of trajectories into a prop.

**Definition 6.** *The composition of two sets of trajectories is defined as* S ; T := {σ ; τ | σ ∈ S, τ ∈ T *are compatible*}. *The product of sets of trajectories is defined as* S<sup>1</sup> ⊕ S<sup>2</sup> := {σ<sup>1</sup> ⊕ σ<sup>2</sup> | σ<sup>1</sup> ∈ S1, σ<sup>2</sup> ∈ S2}.

Clearly both operations are strictly associative. The unit for ⊕ is the singleton with the unique (0, 0)-trajectory. Also ; has a two sided identity, given by sets of "copycat" (n, n)-trajectories. Indeed, we have that:

**Proposition 4.** *Sets of* (n, m)*-trajectories are the arrows* n → m *of a prop* Traj *with composition and monoidal product given as in Definition 6.*

Traj serves for us as the domain for operational semantics: given a circuit c and an *infinite* computation

$$t \rhd c\_0 \xrightarrow[v\_t]{u\_t} c\_1 \xrightarrow[v\_{t+1}]{u\_{t+1}} c\_2 \xrightarrow[v\_{t+2}]{u\_{t+2}} \dots \dots$$

its associated trajectory σ is

$$\sigma(i) = \begin{cases} (u\_i, v\_i) & \text{if } i \ge t, \\ (0, 0) & \text{otherwise.} \end{cases} \tag{8}$$

**Definition 7.** *For a circuit* c*,* c *is the set of trajectories given by its infinite computations, following the translation* (8) *above.*

The assignment c → c is compositional, that is:

**Theorem 1.** ·: ACirc → Traj *is a morphism of props.*

*Example 6.* Consider the computations (5) and (6) from Example 5. According to (8) both are translated into the trajectory σ mapping i ≥ 0 into (ai, ai) and i < 0 into (0, 0). The reader can easily verify that, more generally, it holds that  = <sup>x</sup> <sup>x</sup> . At this point it is worth to remark that the two circuits would be distinguished when looking at their traces: the trace of computation (5) is different from the trace of (6). Indeed, the full abstraction result in [9] does not hold for all circuits, but only for those of a certain kind. The affine extension obliges us to consider computations that starts in the past and, in turn, this drives us toward a stronger full abstraction result, shown in the next section.

Before concluding, it is important to emphasise that  = <sup>x</sup> <sup>x</sup> also holds. Indeed, problematic computations, like (7), are all finite and, by definition, do not give rise to any trajectory. The reader should note that the use of trajectories is not a semantic device to get rid of problematic computations. In fact, trajectories do not appear in the statement of our full abstraction result; they are merely a convenient tool to prove it. Another result (Proposition 9) independently takes care of ruling out problematic computations.

## **4 Contextual Equivalence and Full Abstraction**

This section contains the main contribution of the paper: a traditional full abstraction result asserting that contextual equivalence agrees with denotational equivalence. It is not a coincidence that we prove this result in the affine setting: affinity plays a crucial role, both in its statement and proof. In particular, Proposition 3 gives us two possibilities for the denotation of (0, 0) circuits: *(i)* ∅—which, roughly speaking, means that there is a problem (see e.g. Example 4) and no infinite computation is possible—or *(ii)* id0, in which case infinite computations are possible. This provides us with a basic notion of observation, akin to observing termination vs non-termination in the λ-calculus.

**Definition 8.** *For a circuit* c ∈ ACirc[0, 0] *we write* c ↑ *if* c *can perform an infinite computation and* c /↑ *otherwise. For instance* ↑*, while* /↑*.*

To be able to make observations about arbitrary circuits we need to introduce an appropriate notion of context. Roughly speaking, contexts for us are (0, 0)-circuits with a hole into which we can plug another circuit. Since ours is a variable-free presentation, "dangling wires" assume the role of free variables [16]: restricting to (0, 0) contexts is therefore analogous to considering *ground* contexts—i.e. contexts with no free variables—a standard concept of programming language theory.

To define contexts formally, we extend the syntax of Section 2.1 with an extra generator "−" of sort (n, m). A (0, 0)-circuit of this extended syntax is a *context* when "−" occurs exactly once. Given an (n, m)-circuit c and a context C[−], we write C[c] for the circuit obtained by replacing the unique occurrence of "−" by c.

With this setup, given an (n, m)-circuit c, we can insert it into a context C[−] and observe the possible outcome: either C[c] ↑ or C[c] /↑. This naturally leads us to contextual equivalence and the statement of our main result.

**Definition 9.** *Given* c, d ∈ ACirc[n, m]*, we say that they are* contextually equivalent*, written* c ≡ d*, if for all contexts* C[−]*,*

$$C[c] \uparrow \quad \sharp \\ f \\ C[d] \uparrow \quad .$$

*Example 7.* Recall from Example 5, the circuits and <sup>x</sup> <sup>x</sup> . Take the context C[−] = c<sup>σ</sup> ; − ; c<sup>τ</sup> for c<sup>σ</sup> ∈ ACirc[0, 1] and c<sup>τ</sup> ∈ ACirc[1, 0]. Assume that c<sup>σ</sup> and c<sup>τ</sup> have a single infinite computation. Call σ and τ the corresponding trajectories. If σ = τ , both C[ ] and C[ <sup>x</sup> <sup>x</sup> ] would be able to perform an infinite computation. Instead if σ = τ , none of them would perform any infinite computation: would stop at time t, for t the first moment such that σ(t) = τ (t), while C[ <sup>x</sup> <sup>x</sup> ] would stop at time t + 1.

Now take as context C[−] = ; − ; . In contrast to c<sup>σ</sup> and c<sup>τ</sup> , and can perform more than one single computation: at any time they can nondeterministically emit any value. Thus every computation of C[ ] =

can *always* be extended to an infinite one, forcing synchronisation of and at each step. For C[ x x ] = x x , and may emit different values at time t, but the computation will get stuck at t + 1. However, our definition of ↑ only cares about whether C[ <sup>x</sup> <sup>x</sup> ] *can* perform an infinite computation. Indeed it can, as long as and consistently emit the same value at each time step.

If we think of contexts as tests, and say that a circuit c passes test C[−] if C[c] perform an infinite computation, then our notion of contextual equivalence is *may-testing* equivalence [13]. From this perspective, and <sup>x</sup> <sup>x</sup> are not *must equivalent*, since the former must pass the test ; − ; while <sup>x</sup> <sup>x</sup> may not. It is worth to remark here that the distinction between may and must testing will cease to make sense in Section 5 where we identify a certain class of circuits equipped with a proper flow directionality and thus a deterministic, input-output, behaviour.

#### **Theorem 2 (Full abstraction).** c ≡ d *iff* c **aIH** = d

The remainder of this section is devoted to the proof of Theorem 2. We will start by clarifying the relationship between fractions of polynomials (the denotational domain) and trajectories (the operational domain).

#### **4.1 From Polynomial Fractions to Trajectories**

The missing link between polynomial fractions and trajectories are *(formal) Laurent series*: we now recall this notion. Formally, a Laurent series is a function σ : **Z** → k for which there exists j ∈ **Z** such that σ(i) = 0 for all i<j. We write σ as ...,σ(−1), σ(0), σ(1),... with position 0 underlined, or as formal sum <sup>∞</sup> <sup>i</sup>=<sup>d</sup> <sup>σ</sup>(i)x<sup>i</sup> . Each Laurent series σ has then a *degree* d ∈ **Z**, which is the first non-zero element. Laurent series form a field k((x)): sum is pointwise, product is by convolution, and the inverse σ−<sup>1</sup> of σ with degree d is defined as:

$$\sigma^{-1}(i) = \begin{cases} 0 & \text{if } i < -d \\ \sigma(d)^{-1} & \text{if } i = -d \\ \frac{\sum\_{i=1}^{n} \left( \sigma(d+i) \cdot \sigma^{-1}(-d+n-i) \right)}{-\sigma(d)} & \text{if } i = -d+n \text{ for } n > 0 \end{cases} \tag{9}$$

Note (formal) power series, which form 'just' a ring k[[x]], are a particular case of Laurent series, namely those σs for which d ≥ 0. What is most interesting for our purposes is how polynomials and fractions of polynomials relate to k((x)) and k[[x]]. First, the ring k[x] of polynomials embeds into k[[x]], and thus into k((x)): a polynomial <sup>p</sup><sup>0</sup> <sup>+</sup> <sup>p</sup>1<sup>x</sup> <sup>+</sup> ··· <sup>+</sup> <sup>p</sup>nx<sup>n</sup> can also be regarded as the power series ∞ <sup>i</sup>=0 <sup>p</sup>ix<sup>i</sup> with <sup>p</sup><sup>i</sup> = 0 for all i>n. Because Laurent series are closed under division, this immediately gives also an embedding of the field of polynomial fractions k(x) into k((x)). Note that the full expressiveness of k((x)) is required: for instance, the fraction <sup>1</sup> <sup>x</sup> is represented as the Laurent series ..., 0, 1, 0, 0,... , which is not a power series, because a non-zero value appears before position 0. In fact, fractions that are expressible as power series are precisely the *rational* fractions, i.e. of the form <sup>k</sup>0+k1x+k2x2···+knx<sup>n</sup> <sup>l</sup>0+l1x+l2x2···+lnx<sup>n</sup> where <sup>l</sup><sup>0</sup> = 0.

Rational fractions form a ring kx which, differently from the full field k(x), embeds into k[[x]]. Indeed, whenever l<sup>0</sup> = 0, the inverse of <sup>l</sup><sup>0</sup> <sup>+</sup> <sup>l</sup>1<sup>x</sup> <sup>+</sup> <sup>l</sup>2x<sup>2</sup> ··· <sup>+</sup> <sup>l</sup>nx<sup>n</sup> is, by (9), a *bona fide* power series. The commutative diagram on the right is a summary.

Relations between k((x))-vectors organise themselves into a prop ARelk((x)) (see Definition 2). There is an evident prop morphism ι: ARelk(x) → ARelk((x)): it maps the empty affine relation on k(x) to the one on k((x)), and otherwise applies pointwise the embedding of k(x) into k((x)). For the next step, observe that trajectories are in fact rearrangements of Laurent series: each pair of vectors (u, v) <sup>∈</sup> <sup>k</sup>((x))<sup>n</sup> <sup>×</sup> <sup>k</sup>((x))<sup>m</sup>, as on the left below, yields the trajectory <sup>κ</sup>(u, v) defined for all i ∈ **Z** as on the right below.

$$\kappa(u,v) = \left( \begin{pmatrix} \alpha^1\\ \vdots\\ \vdots\\ \alpha^n \end{pmatrix}, \begin{pmatrix} \beta^1\\ \vdots\\ \beta^m \end{pmatrix} \right) \qquad \qquad \kappa(u,v)(i) = \left( \begin{pmatrix} \alpha^1(i)\\ \vdots\\ \alpha^n(i) \end{pmatrix}, \begin{pmatrix} \beta^1(i)\\ \vdots\\ \beta^m(i) \end{pmatrix} \right)$$

Similarly to ι, the assignment κ extends to sets of vectors, and also to a prop morphism from ARelk((x)) to Traj. Together, κ and ι provide the desired link between operational and denotational semantics.

**Theorem 3.** · = κ ◦ ι ◦ [[·]]

*Proof.* Since both are symmetric monoidal functors from a free prop, it is enough to check the statement for the generators of ACirc. We show, as an example, the case of . By Definition 3, [[ ]] = p, p p | p ∈ k(x) . This is mapped by <sup>ι</sup> to α, α α <sup>|</sup> <sup>α</sup> <sup>∈</sup> <sup>k</sup>((x)) . Now, to see that κ(ι([[ ]])) = , it is enough to observe that a trajectory σ is in κ(ι([[ ]])) precisely when, for all <sup>i</sup>, there exists some <sup>k</sup><sup>i</sup> <sup>∈</sup> <sup>k</sup> such that <sup>σ</sup>(i) = ki, ki ki .

#### **4.2 Proof of Full Abstraction**

We now have the ingredients to prove Theorem 2. First, we prove an adequacy result for (0, 0) circuits.

**Proposition 5.** *Let* c ∈ ACirc[0, 0]*. Then* [[c]] = id<sup>0</sup> *if and only if* c ↑*.*

*Proof.* By Proposition 3, either [[c]] = id<sup>0</sup> or [[c]] = ∅, which, combined with Theorem 3, means that c = κ ◦ ι(id0) or c = κ ◦ ι(∅). By definition of ι this implies that either c contains a trajectory or not. In the first case c ↑; in the second c /↑.

Next we obtain a result that relates denotational equality in all contexts to equality in **aIH**. Note that it is not trivial: since we consider ground contexts it does not make sense to merely consider "identity" contexts. Instead, it is at this point that we make another crucial use of affinity, taking advantage of the increased expressivity of affine circuits, as showcased by Proposition 2.

#### **Proposition 6.** *If* [[C[c]]] = [[C[d]]] *for all contexts* C[−]*, then* c **aIH** = d*.*

*Proof.* Suppose that c **aIH** = d. Then [[c]] = [[d]]. Since both [[c]] and [[d]] are affine relations over <sup>k</sup>(x), there exists a pair of vectors (u, v) <sup>∈</sup> <sup>k</sup>(x)<sup>n</sup> <sup>×</sup>k(x)<sup>m</sup> that is in one of [[c]] and [[d]], but not both. Assume wlog that (u, v) ∈ [[c]] and (u, v) ∈/ [[d]]. By Proposition 2, there exists c<sup>u</sup> and c<sup>v</sup> such that [[c<sup>u</sup> ; c ; cv]] = [[cu]] ; [[c]] ; [[cv]] = {(•, u)} ; [[c]] ; {(v, •)}. Since (u, v) ∈ [[c]], then [[c<sup>u</sup> ; c ; cv]] = {(•, •)}. Instead, since (u, v) ∈/ [[d]], we have that [[c<sup>u</sup> ; d ; cv]] = ∅. Therefore, for the context C[−] = c<sup>u</sup> ; − ; cv, we have that [[C[c]]] = [[C[d]]].

The proof of our main result is now straightforward.

*Proof of Theorem 2.* Let us first suppose that c **aIH** = d. Then [[C[c]]] = [[C[d]]] for all contexts C[−], since [[·]] is a morphism of props. By Corollary 5, it follows immediately that C[c] ↑ if and only if C[d] ↑, namely c ≡ d.

Conversely, suppose that, for all C[−], C[c] ↑ iff C[d] ↑. Again by Corollary 5, we have that [[C[c]]] = [[C[d]]]. We conclude by invoking Proposition 6.

## **5 Functional Behaviour and Signal Flow Graphs**

There is a sub-prop SF of Circ of classical *signal flow graphs* (see *e.g.* [21]). Here signal flows left-to-right, possibly featuring *feedback loops*, provided that these go through at least one register. Feedback can be captured algebraically via an operation Tr(·): Circ[n + 1, m + 1] → Circ[n, m] taking c : n + 1 → m + 1 to:

Following [9], let us call Circ−→ the free sub-prop of Circ of circuits built from (3) and the generators of (1), without . Then SF is defined as the closure of Circ−→ under Tr(·). For instance, the circuit of Example 2 is in SF.

Signal flow graphs are intimately connected to the executability of circuits. In general, the rules of Figure 2 do not assume a fixed flow orientation. As a result, some circuits in Circ are not executable as *functional input-output* systems, as we have demonstrated with <sup>x</sup> , and <sup>x</sup> <sup>x</sup> of Examples 3-5. Notice that none of these are signal flow graphs. In fact, the circuits of SF do not have pathological behaviour, as we shall state more precisely in Proposition 9.

At the denotational level, signal flow graphs correspond precisely to *rational* functional behaviours, that is, matrices whose coefficients are in the ring kx of *rational fractions* (see Section 4.1). We call such matrices, rational matrices. One may check that the semantics of a signal flow graph c : (n, m) is always of the form [[c]] = {(v, A · <sup>v</sup>) <sup>|</sup> <sup>v</sup> <sup>∈</sup> <sup>k</sup>(x)n}, for some <sup>m</sup> <sup>×</sup> <sup>n</sup> rational matrix <sup>A</sup>. Conversely, all relations that are the graph of rational matrices can be expressed as signal flow graphs.

**Proposition 7.** *Given* <sup>c</sup> : (n, m)*, we have* [[c]] = {(p, A · <sup>p</sup>) <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>k</sup>(x)n} *for some rational* m×n *matrix* A *iff there exists a signal flow graph* f*, i.e., a circuit* f : (n, m) *of* SF*, such that* [[f]] = [[c]]*.*

*Proof.* This is a folklore result in control theory which can be found in [30]. The details of the translation between rational matrices and circuits of SF can be found in [10, Section 7].

The following gives an alternative characterisation of rational matrices—and therefore, by Proposition 7, of the behaviour of signal flow graphs—that clarifies their role as realisations of circuits.

**Proposition 8.** *An* <sup>m</sup> <sup>×</sup> <sup>n</sup> *matrix is rational iff* <sup>A</sup> · <sup>r</sup> <sup>∈</sup> <sup>k</sup><sup>x</sup> <sup>m</sup> *for all* <sup>r</sup> <sup>∈</sup> <sup>k</sup><sup>x</sup> <sup>n</sup>*.*

Proposition 8 is another guarantee of good behaviour—it justifies the name of inputs (resp. outputs) for the left (resp. right) ports of signal flow graphs. Recall from Section 4.1 that rational fractions can be mapped to Laurent series of nonnegative degree, i.e., to plain power series. Operationally, these correspond to trajectories that start after t = 0. Proposition 8 guarantees that any trajectory of a signal flow graph whose first nonzero value on the left appears at time t = 0, will not have nonzero values on the right starting before time t = 0. In other words, signal flow graphs can be seen as processing a stream of values from left to right. As a result, their ports can be clearly partitioned into inputs and outputs.

But the circuits of SF are too restrictive for our purposes. For example, <sup>x</sup> can also be seen to realise a functional behaviour transforming inputs on the left into outputs on the right yet it is not in SF. Its behaviour is no longer linear, but affine. Hence, we need to extend signal flow graphs to include functional affine behaviour. The following definition does just that.

**Definition 10.** *Let* ASF *be the sub-prop of* ACirc *obtained from* all *the generators in* (1)*, closed under* Tr(·)*. Its circuits are called* affine signal flow graphs*.*

As before, none of <sup>x</sup> , and <sup>x</sup> <sup>x</sup> from Examples 3-5 are affine signal flow graphs. In fact, ASF rules out pathological behaviour: all computations can be extended to be infinite, or in other words, do not get stuck.

**Proposition 9.** *Given an affine signal flow graph* f*, for every computation*

$$t \rhd f\_0 \xrightarrow[v\_t]{u\_t} f\_1 \xrightarrow[v\_{p+1}]{u\_{t+1}} \dots f\_n$$

*there exists a trajectory* σ ∈ c *such that* σ(i)=(ui, vi) *for* t ≤ i ≤ t + n*.*

*Proof.* By induction on the structure of affine signal flow graphs.

If SF circuits correspond precisely to kx-matrices, those of ASF correspond precisely to kx-affine transformations.

**Definition 11.** *A map* <sup>f</sup>: <sup>k</sup>(x)<sup>n</sup> <sup>→</sup> <sup>k</sup>(x)<sup>m</sup> *is an* affine map *if there exists an* <sup>m</sup> <sup>×</sup> <sup>n</sup> *matrix* <sup>A</sup> *and* <sup>b</sup> <sup>∈</sup> <sup>k</sup>(x)<sup>m</sup> *such that* <sup>f</sup>(p) = <sup>A</sup> · <sup>p</sup> <sup>+</sup> <sup>b</sup> *for all* <sup>p</sup> <sup>∈</sup> <sup>k</sup>(x)n*. We call the pair* (A, b) *the representation of* f*.*

The notion of rational affine map is a straightforward extension of the linear case and so is the characterisation in terms of rational input-output behaviour.

**Definition 12.** *An affine map* f: p → A · p + b *is* rational *if* A *and* b *have coefficients in* kx*.*

**Proposition 10.** *An affine map* <sup>f</sup>: <sup>k</sup>(x)<sup>n</sup> <sup>→</sup> <sup>k</sup>(x)<sup>m</sup> *is rational iff* <sup>f</sup>(r) <sup>∈</sup> <sup>k</sup><sup>x</sup> <sup>m</sup> *for all* <sup>r</sup> <sup>∈</sup> <sup>k</sup><sup>x</sup> <sup>n</sup>*.*

The following extends the correspondence of Proposition 7, showing that ASF is the rightful affine heir of SF.

**Proposition 11.** *Given* <sup>c</sup> : (n, m)*, we have* [[c]] = {(p, f(p)) <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>k</sup>(x)<sup>n</sup>} *for some rational affine map* f *iff there exists an affine signal flow graph* g*, i.e., a circuit* g : (n, m) *of* ASF*, such that* [[g]] = [[c]]*.*

*Proof.* Let f be given by p → Ap + b for some rational m × n matrix A and vector <sup>b</sup> <sup>∈</sup> <sup>k</sup><sup>x</sup> <sup>m</sup>. By Proposition 7, we can find a circuit <sup>c</sup><sup>A</sup> of SF such that

[[cA]] = {(p, A · p) | p ∈ k(x)}. Similarly, we can represent b as a signal flow graph c<sup>b</sup> of sort (1, m). Then, the circuit on the right is clearly in ASF and verifies [[c]] = {(p, Ap + b) | p ∈ k(x)} as required.

For the converse direction it is straightforward to check by structural induction that the denotation of affine signal flow graphs is the graph (in the set-theoretic sense of pairs of values) of some rational affine map.

## **6 Realisability**

In the previous section we gave a restricted class of morphisms with good behavioural properties. We may wonder how much of ACirc we can capture with this restricted class. The answer is, in a precise sense: most of it.

Surprisingly, the behaviours realisable in Circ—the purely linear fragment are not more expressive. In fact, from an operational (or denotational, by full abstraction) point of view, Circ is nothing more than jumbled up version of SF. Indeed, it turns out that Circ enjoys a *realisability* theorem: any circuit c of Circ can be associated with one of SF, that implements or realises the behaviour of c into an executable form.

But the corresponding realisation may not flow neatly from left to right like signal flow graphs do—its inputs and outputs may have been moved from one side to the other. Consider for example, the circuit on the right

It does not belong to SF but it can be read as a signal flow graph with an input that has been bent and moved to the bottom right. The behaviour it realises can therefore executed by rewiring this port to obtain a signal flow graph:

We will not make this notion of rewiring precise here but refer the reader to [9] for the details. The intuition is simply that a rewiring partitions the ports of a circuit into two sets—that we call inputs and outputs—and uses or to bend input ports to the left and and output ports to the right. The realisability theorem then states that we can always recover a (not necessarily unique) signal flow graph from any circuit by performing these operations.

**Theorem 4.** *[9, Theorem 5] Every circuit in* Circ *is equivalent to the rewiring of a signal flow graph, called its* realisation*.*

This theorem allows us to extend the notion of inputs and outputs to all circuits of Circ.

**Definition 13.** *A port of a circuit* c *of* Circ *is an* input *(resp.* output*) port, if there exists a realisation for which it is an input (resp. output).*

Note that, since realisations are not necessarily unique, the same port can be both an input and an output. Then, the realisability theorem (Theorem 4) says that every port is always an input, an output or both (but never neither).

An output-only port is an output port that is not an input port. Similarly an input-only port in an input port that is not an output port.

*Example 8.* The left port of the register <sup>x</sup> is input-only whereas its right port is output-only. In the identity wire, both ports are input and output ports. The single port of is output-only ; that of is input-only.

While in the purely linear case, all behaviours are realisable, the general case of ACirc is a bit more subtle. To make this precise, we can extend our definition of realisability to include affine signal flow graphs.

**Definition 14.** *A circuit of* ACirc *is* realisable *if its ports can be rewired so that it is equivalent to a circuit of* ASF*.*

*Example 9.* is realisable; <sup>x</sup> is not.

Notice that Proposition 11, gives the following equivalent semantic criterion for realisability. Realisable behaviours are precisely those that map rationals to rationals.

**Theorem 5.** *A circuit* c *is realisable iff its ports can be partitioned into two sets, that we call inputs and outputs, such that the corresponding rewiring of* c *is an affine rational map from inputs to outputs.*

We offer another perspective on realisability below: realisable behaviours correspond precisely to those for which the constants are connected to inputs of the underlying Circ-circuit. First, notice that, since

(1-dup) = and (1-del) =

in **aIH**, we can assume without loss of generality that each circuit contains exactly one .

**Proposition 12.** *Every circuit* c *of* ACirc *is equivalent to one with precisely one and no .*

For c : (n, m) a circuit of ACirc, we will call ˆc the circuit of Circ of sort (n + 1, m) that one obtains by first transforming c into an equivalent circuit with a single and no as above, then removing this , and replacing it by an identity wire that extends to the left boundary.

**Theorem 6.** *A circuit* c *is realisable iff is connected to an input port of* cˆ*.*

## **7 Conclusion and Future Work**

We introduced the operational semantics of the *affine* extension of the signal flow calculus and proved that contextual equivalence coincides with denotational equality, previously introduced and axiomatised in [6]. We have observed that, at the denotational level, affinity provides two key properties (Propositions 2 and 3) for the proof of full abstraction. However, at the operational level, affinity forces us to consider computations starting in the *past* (Example 3) as the syntax allows terms lacking a proper flow directionality. This leads to circuits that might deadlock ( in Example 4) or perform some problematic computations ( <sup>x</sup> <sup>x</sup> in Example 5). We have identified a proper subclass of circuits, called affine signal flow graphs (Definition 10), that possess an inherent flow directionality: in these circuits, the same pathological behaviours do not arise (Proposition 9). This class is not too restrictive as it captures all desirable behaviours: a realisability result (Theorem 5) states that all and only the circuits that do not need computations to start in the past are equivalent to (the rewiring of) an affine signal flow graph.

The reader may be wondering why we do not restrict the syntax to affine signal flow graphs. The reason is that, like in the behavioural approach to control theory [33], the lack of flow direction is what allows the (affine) signal flow calculus to achieve a strong form of compositionality and a complete axiomatisation (see [9] for a deeper discussion).

We expect that similar methods and results can be extended to other models of computation. Our next step is to tackle Petri nets, which, as shown in [5], can be regarded as terms of the signal flow calculus, but over **N** rather than a field.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Parameterized Synthesis for Fragments of First-Order Logic over Data Words**

B´eatrice B´erard1, Benedikt Bollig2, Mathieu Lehaut1(-) , and Nathalie Sznajder<sup>1</sup>

<sup>1</sup> Sorbonne Universit´e, CNRS, LIP6, F-75005 Paris, France <sup>2</sup> CNRS, LSV & ENS Paris-Saclay, Universit´e Paris-Saclay, Cachan, France

**Abstract.** We study the synthesis problem for systems with a parameterized number of processes. As in the classical case due to Church, the system selects actions depending on the program run so far, with the aim of fulfilling a given specification. The difficulty is that, at the same time, the environment executes actions that the system cannot control. In contrast to the case of fixed, finite alphabets, here we consider the case of parameterized alphabets. An alphabet reflects the number of processes, which is static but unknown. The synthesis problem then asks whether there is a finite number of processes for which the system can satisfy the specification. This variant is already undecidable for very limited logics. Therefore, we consider a first-order logic without the order on word positions. We show that even in this restricted case synthesis is undecidable if both the system and the environment have access to all processes. On the other hand, we prove that the problem is decidable if the environment only has access to a bounded number of processes. In that case, there is even a cutoff meaning that it is enough to examine a bounded number of process architectures to solve the synthesis problem.

## **1 Introduction**

Synthesis deals with the problem of automatically generating a program that satisfies a given specification. The problem goes back to Church [9], who formulated it as follows: The environment and the system alternately select an input symbol and an output symbol from a finite alphabet, respectively, and in this way generate an infinite sequence. The question now is whether the system has a winning strategy, which guarantees that the resulting infinite run is contained in a given (ω)-regular language representing the specification, no matter how the environment behaves. This problem is decidable and very well understood [8,37], and it has been extended in several different ways (e.g., [24, 26, 28, 36, 43]).

In this paper, we consider a variant of the synthesis problem that allows us to model programs with a variable number of processes. As we then deal with an unbounded number of process identifiers, a fixed finite alphabet is not suitable anymore. It is more appropriate to use an infinite alphabet, in which every

<sup>-</sup>Partly supported by ANR FREDDA (ANR-17-CE40-0013).

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 97–118, 2020. https://doi.org/10.1007/978-3-030-45231-5\_6

letter contains a process identifier and a program action. One can distinguish two cases here. In [16], a potentially infinite number of data values are involved in an infinite program run (e.g. by dynamic process generation). In a parameterized system [4, 13], on the other hand, one has an unknown but static number of processes so that, along each run, the number of processes is finite. In this paper, we are interested in the latter, i.e., parameterized case. Parameterized programs are ubiquitous and occur, e.g., in distributed algorithms, ad-hoc networks, telecommunication protocols, cache-coherence protocols, swarm robotics, and biological systems. The synthesis question asks whether the system has a winning strategy for some number of processes (existential version) or no matter how many processes there are (universal version).

Over infinite alphabets, there are a variety of different specification languages (e.g., [5, 11, 12, 19, 29, 33, 40]). Unlike in the case of finite alphabets, there is no canonical definition of regular languages. In fact, the synthesis problem has been studied for N-memory automata [7], the Logic of Repeating Values [16], and register automata [15,30,31]. Though there is no agreement on a "regular" automata model, first-order (FO) logic over data words can be considered as a canonical logic, and this is the specification language we consider here. In addition to classical FO logic on words over finite alphabets, it provides a predicate x ∼ y to express that two events x and y are triggered by the same process. Its twovariable fragment FO<sup>2</sup> has a decidable emptiness and universality problem [5] and is, therefore, a promising candidate for the synthesis problem.

Previous generalizations of Church's synthesis problem to infinite alphabets were generally synchronous in the sense that the system and the environment perform their actions in strictly alternating order. This assumption was made, e.g., in the above-mentioned recent papers [7, 15, 16, 30, 31]. If there are several processes, however, it is realistic to relax this condition, which leads us to an asynchronous setting in which the system has no influence on when the environment acts. Like in [21], where the asynchronous case for a fixed number of processes was considered, we only make the reasonable fairness assumption that the system is not blocked forever.

In summary, the synthesis problem over infinite alphabets can be classified as (i) parameterized vs. dynamic, (ii) synchronous vs. asynchronous, and (iii) according to the specification language (register automata, Logic of Repeating Values, FO logic, etc.). As explained above, we consider here the parameterized asynchronous case for specifications written in FO logic. To the best of our knowledge, this combination has not been considered before. For flexible modeling, we also distinguish between three types of processes: those that can only be controlled by the system; those that can only be controlled by the environment; and finally those that can be triggered by both. A partition into system and environment processes is also made in [3,18], but for a fixed number of processes and in the presence of an arena in terms of a Petri net.

Let us briefly describe our results. We show that the general case of the synthesis problem is undecidable for FO<sup>2</sup> logic. This follows from an adaptation of an undecidability result from [16,17] for a fragment of the Logic of Repeating Values [11]. We therefore concentrate on an orthogonal logic, namely FO without the order on the word positions. First, we show that this logic can essentially count processes and actions of a given process up to some threshold. Though it has limited expressive power (albeit orthogonal to that of FO2), it leads to intricate behaviors in the presence of an uncontrollable environment. In fact, we show that the synthesis problem is still undecidable. Due to the lack of the order relation, the proof requires a subtle reduction from the reachability problem in 2-counter Minsky machines. However, it turns out that the synthesis problem is decidable if the number of processes that are controllable by the environment is bounded, while the number of system processes remains unbounded. In this case, there is even a cutoff k, an important measure for parameterized systems (cf. [4] for an overview): If the system has a winning strategy for k processes, then it has one for any number of processes greater than k, and the same applies to the environment. The proofs of both main results rely on a reduction of the synthesis problem to turn-based parameterized vector games, in which, similar to Petri nets, tokens corresponding to processes are moved around between states.

The paper is structured as follows. In Section 2, we define FO logic (especially FO without word order), and in Section 3, we present the parameterized synthesis problem. In Section 4, we transform a given formula into a normal form and finally into a parameterized vector game. Based on this reduction, we investigate cutoff properties and show our (un)decidability results in Section 5. We conclude in Section 6. Some proof details can be found in the long version of this paper [2]

## **2 Preliminaries**

For a finite or infinite alphabet Σ, let Σ<sup>∗</sup> and Σ<sup>ω</sup> denote the sets of finite and, respectively, infinite words over <sup>Σ</sup>. The empty word is <sup>ε</sup>. Given <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> <sup>∪</sup> <sup>Σ</sup><sup>ω</sup>, let |w| denote the length of w and Pos(w) its set of positions: |w| = n and Pos(w) = {1,...,n} if w = σ1σ<sup>2</sup> ...σ<sup>n</sup> ∈ Σ∗, and |w| = ω and Pos(w) = {1, <sup>2</sup>,...} if <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>. Let <sup>w</sup>[i] be the <sup>i</sup>-th letter of <sup>w</sup> for all <sup>i</sup> <sup>∈</sup> Pos(w).

**Executions.** We consider programs involving a finite (but not fixed) number of processes. Processes are controlled by antagonistic protagonists, System and Environment. Accordingly, each process has a type among <sup>T</sup> <sup>=</sup> {s, <sup>e</sup>,se}, and we let Ps, Pe, and Pse denote the pairwise disjoint finite sets of processes controlled by System, by Environment, and by both System and Environment, respectively. We let P denote the triple (Ps, Pe, Pse). Abusing notation, we sometimes refer to <sup>P</sup> as the disjoint union <sup>P</sup><sup>s</sup> <sup>∪</sup> <sup>P</sup><sup>e</sup> <sup>∪</sup> <sup>P</sup>se.

Given any set <sup>S</sup>, vectors <sup>s</sup> <sup>∈</sup> <sup>S</sup><sup>T</sup> are usually referred to as triples <sup>s</sup> <sup>=</sup> (ss, se, sse). Moreover, for s, s <sup>∈</sup> <sup>N</sup><sup>T</sup>, we write <sup>s</sup> <sup>≤</sup> <sup>s</sup> if <sup>s</sup><sup>θ</sup> <sup>≤</sup> <sup>s</sup> <sup>θ</sup> for all <sup>θ</sup> <sup>∈</sup> <sup>T</sup>. Finally, let s + s = (s<sup>s</sup> + s <sup>s</sup>, s<sup>e</sup> + s <sup>e</sup>, sse + s se).

Processes can execute actions from a finite alphabet A. Whenever an action is executed, we would like to know whether it was triggered by System or by Environment. Therefore, <sup>A</sup> is partitioned into <sup>A</sup> <sup>=</sup> <sup>A</sup>sAe. Let <sup>Σ</sup><sup>s</sup> <sup>=</sup> <sup>A</sup>s×(Ps∪Pse) and <sup>Σ</sup><sup>e</sup> <sup>=</sup> <sup>A</sup><sup>e</sup> <sup>×</sup> (P<sup>e</sup> <sup>∪</sup> <sup>P</sup>se). Their union <sup>Σ</sup> <sup>=</sup> <sup>Σ</sup><sup>s</sup> <sup>∪</sup> <sup>Σ</sup><sup>e</sup> is the set of events. A word <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> <sup>∪</sup> <sup>Σ</sup><sup>ω</sup> is called a <sup>P</sup>-execution.

**Fig. 1.** Representation of P-execution as a mathematical structure

**Logic.** Formulas of our logic are evaluated over P-executions. We fix an infinite supply <sup>V</sup> <sup>=</sup> {x, y, z, . . .} of variables, which are interpreted as processes from <sup>P</sup> or positions of the execution. The logic FOA[∼, <, +1] is given by the grammar

$$\varphi \implies = \theta(x) \mid a(x) \mid x = y \mid x \sim y \mid x < y \mid +1(x, y) \mid \neg \varphi \mid \varphi \lor \varphi \mid \exists x. \varphi$$

where x, y ∈ V, <sup>θ</sup> <sup>∈</sup> <sup>T</sup>, and <sup>a</sup> <sup>∈</sup> <sup>A</sup>. Conjunction (∧), universal quantification (∀), implication (=⇒), true, and false are obtained as abbreviations as usual.

Let ϕ ∈ FOA[∼, <, +1]. By Free(ϕ) ⊆ V, we denote the set of variables that occur free in ϕ. If Free(ϕ) = ∅, then we call ϕ a sentence. We sometimes write ϕ(x1,...,xn) to emphasize the fact that Free(ϕ) ⊆ {x1,...,x<sup>n</sup>}.

To evaluate ϕ over a P-execution w = (a1, p1)(a2, p2)..., we consider (P, w) as a structure <sup>S</sup>(P,w) = (<sup>P</sup> Pos(w), <sup>P</sup>s, <sup>P</sup>e, <sup>P</sup>se,(Ra)<sup>a</sup>∈<sup>A</sup>, <sup>∼</sup>, <, +1) where <sup>P</sup>Pos(w) is the universe, P<sup>s</sup> Pe, and Pse are interpreted as unary relations, R<sup>a</sup> is the unary relation {i ∈ Pos(w) | a<sup>i</sup> = a}, < = {(i, j) ∈ Pos(w) × Pos(w) | i<j}, +1 = {(i, i + 1) | 1 ≤ i < |w|}, and ∼ is the smallest equivalence relation over <sup>P</sup> Pos(w) containing


An equivalence class of ∼ is often simply referred to as a class. Note that it contains exactly one process.

Example 1. Suppose A<sup>s</sup> = {a, b} and A<sup>e</sup> = {c, d}. Let the set of processes <sup>P</sup> be given by <sup>P</sup><sup>s</sup> <sup>=</sup> {1, <sup>2</sup>, <sup>3</sup>}, <sup>P</sup><sup>e</sup> <sup>=</sup> {4, <sup>5</sup>}, and <sup>P</sup>se <sup>=</sup> {6, <sup>7</sup>, <sup>8</sup>}. Moreover, let w = (a, 1)(b, 8)(d, 7)(c, 4)(a, 6)(c, 6)(a, 7)(d, 6)(b, 2)(d, 7)(a, 7) ∈ Σ∗. Figure 1 illustrates <sup>S</sup>(P,w). The edge relation represents +1, its transitive closure is <sup>&</sup>lt;. -

An interpretation for (P, w) is a partial mapping <sup>I</sup> : V → <sup>P</sup> <sup>∪</sup> Pos(w). Suppose ϕ ∈ FOA[∼, <, +1] such that Free(ϕ) ⊆ dom(I). The satisfaction relation (P, w), I <sup>|</sup><sup>=</sup> <sup>ϕ</sup> is then defined as expected, based on the structure <sup>S</sup>(P,w) and interpreting free variables according to I. For example, let w = (a1, p1)(a2, p2)... and <sup>i</sup> <sup>∈</sup> Pos(w). Then, for <sup>I</sup>(x) = <sup>i</sup>, we have (P, w), I <sup>|</sup><sup>=</sup> <sup>a</sup>(x) if <sup>a</sup><sup>i</sup> <sup>=</sup> <sup>a</sup>.

We identify some fragments of FOA[∼, <, +1]. For R ⊆ {∼, <, +1}, let FOA[R] denote the set of formulas that do not use symbols in {∼, <, +1} \ R. Moreover, FO<sup>2</sup> <sup>A</sup>[R] denotes the fragment of FOA[R] that uses only two (reusable) variables.

Let <sup>ϕ</sup>(x1,...,xn, y) <sup>∈</sup> FOA[∼, <, +1] be a formula and <sup>m</sup> <sup>∈</sup> <sup>N</sup>. We use <sup>∃</sup>≥my.ϕ(x1,...,xn, y) as an abbreviation for

$$\exists y\_1 \dots \exists y\_m. \bigwedge\_{1 \le i < j \le m} \neg(y\_i = y\_j) \land \bigwedge\_{1 \le i \le m} \varphi(x\_1, \dots, x\_n, y\_i),$$

if m > 0, and <sup>∃</sup>≥0y.ϕ(x1,...,xn, y) = true. Thus, <sup>∃</sup>≥my.ϕ says that there are at least <sup>m</sup> distinct elements that verify <sup>ϕ</sup>. We also use <sup>∃</sup>=my.ϕ as an abbreviation for <sup>∃</sup><sup>≥</sup><sup>m</sup>y.ϕ∧ ¬∃<sup>≥</sup>m+1y.ϕ. Note that <sup>ϕ</sup> <sup>∈</sup> FOA[R] implies that <sup>∃</sup><sup>≥</sup><sup>m</sup>y.ϕ <sup>∈</sup> FOA[R] and <sup>∃</sup><sup>=</sup><sup>m</sup>y.ϕ <sup>∈</sup> FOA[R].

Example 2. Let A, P, and w be like in Example 1 and Figure 1.


## **3 Parameterized Synthesis Problem**

We define an asynchronous synthesis problem. A P-strategy (for System) is a mapping <sup>f</sup> : <sup>Σ</sup><sup>∗</sup> <sup>→</sup> <sup>Σ</sup><sup>s</sup> ∪ {ε}. A <sup>P</sup>-execution <sup>w</sup> <sup>=</sup> <sup>σ</sup>1σ<sup>2</sup> ... <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> <sup>∪</sup> <sup>Σ</sup><sup>ω</sup> is <sup>f</sup>compatible if, for all i ∈ Pos(w) such that σ<sup>i</sup> ∈ Σs, we have f(σ<sup>1</sup> ...σ<sup>i</sup>−<sup>1</sup>) = σi. We call w f-fair if the following hold: (i) If w is finite, then f(w) = ε, and (ii) if w is infinite and f(σ<sup>1</sup> ...σ<sup>i</sup>−<sup>1</sup>) = ε for infinitely many i ≥ 1, then σ<sup>j</sup> ∈ Σ<sup>s</sup> for infinitely many j ≥ 1.

Let <sup>ϕ</sup> <sup>∈</sup> FOA[∼, <, +1] be a sentence. We say that <sup>f</sup> is <sup>P</sup>-winning for <sup>ϕ</sup> if, for every <sup>P</sup>-execution <sup>w</sup> that is <sup>f</sup>-compatible and <sup>f</sup>-fair, we have (P, w) <sup>|</sup><sup>=</sup> <sup>ϕ</sup>.

The existence of a P-strategy that is P-winning for a given formula does not depend on the concrete process identities but only on the cardinality of the sets Ps, Pe, and Pse. This motivates the following definition of winning triples for a formula. Given <sup>ϕ</sup>, let Win(ϕ) be the set of triples (ks, ke, kse) <sup>∈</sup> <sup>N</sup><sup>T</sup> for which there is <sup>P</sup> = (Ps, <sup>P</sup>e, <sup>P</sup>se) such that <sup>|</sup>P<sup>θ</sup><sup>|</sup> <sup>=</sup> <sup>k</sup><sup>θ</sup> for all <sup>θ</sup> <sup>∈</sup> <sup>T</sup> and there is a <sup>P</sup>-strategy that is P-winning for ϕ.

Let <sup>0</sup> <sup>=</sup> {0} and <sup>k</sup>e, kse <sup>∈</sup> <sup>N</sup>. In this paper, we focus on the intersection of Win(ϕ) with the sets <sup>N</sup> <sup>×</sup> <sup>0</sup> <sup>×</sup> <sup>0</sup> (which corresponds to the usual satisfiability problem); <sup>N</sup> × {ke}×{kse} (there is a constant number of environment and mixed processes); <sup>N</sup>×N× {kse} (there is a constant number of mixed processes); <sup>0</sup> <sup>×</sup> <sup>0</sup> <sup>×</sup> <sup>N</sup> (each process is controlled by both System and Environment).

**Definition 3 (synthesis problem).** For fixed <sup>F</sup> ∈ {FO, FO2}, set of relation symbols <sup>R</sup> ⊆ {∼, <, +1}, and <sup>N</sup>s, <sup>N</sup>e, <sup>N</sup>se <sup>⊆</sup> <sup>N</sup>, the (parameterized) synthesis problem is given as follows:


The satisfiability problem for F[R] is defined as Synth(F[R], N, 0, 0).

Example 4. Suppose A<sup>s</sup> = {a, b} and A<sup>e</sup> = {c, d}, and consider the formulas ϕ1–ϕ<sup>4</sup> from Example 2.

First, we have Win(ϕ1) = <sup>N</sup><sup>T</sup>. Given an arbitrary <sup>P</sup> and any total order over <sup>P</sup><sup>s</sup> <sup>∪</sup> <sup>P</sup>se, a possible <sup>P</sup>-strategy <sup>f</sup> that is <sup>P</sup>-winning for <sup>ϕ</sup><sup>1</sup> maps <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> to (a, p) if <sup>p</sup> is the smallest process from <sup>P</sup><sup>s</sup> <sup>∪</sup> <sup>P</sup>se wrt. that does not occur in <sup>w</sup>, and that returns <sup>ε</sup> for <sup>w</sup> if all processes from <sup>P</sup><sup>s</sup> <sup>∪</sup> <sup>P</sup>se already occur in <sup>w</sup>.

For the three formulas ϕ2, ϕ3, and ϕ4, observe that, since d is an environment action, if there is at least one process that is exclusively controlled by Environment, then there is no winning strategy. Hence we must have <sup>P</sup><sup>e</sup> <sup>=</sup> <sup>∅</sup>. In fact, this condition is sufficient in the three cases and the strategies described below show that all three sets Win(ϕ2), Win(ϕ3), and Win(ϕ4) are equal to <sup>N</sup>×0×N.


Another interesting question is whether System (or Environment) has a winning strategy as soon as the number of processes is big enough. This leads to the notion of a cutoff (cf. [4] for an overview): Let <sup>N</sup>s, <sup>N</sup>e, <sup>N</sup>se <sup>⊆</sup> <sup>N</sup> and <sup>W</sup> <sup>⊆</sup> <sup>N</sup><sup>T</sup>. We call *<sup>k</sup>***<sup>0</sup>** <sup>∈</sup> <sup>N</sup><sup>T</sup> <sup>a</sup> cutoff of <sup>W</sup> wrt. (Ns, <sup>N</sup>e, <sup>N</sup>se) if *<sup>k</sup>***<sup>0</sup>** ∈ N<sup>s</sup> × N<sup>e</sup> × Nse and either


Let <sup>F</sup> ∈ {FO, FO<sup>2</sup>} and <sup>R</sup> ⊆ {∼, <, +1}. If, for every alphabet <sup>A</sup> <sup>=</sup> <sup>A</sup><sup>s</sup> <sup>A</sup><sup>e</sup> and every sentence ϕ ∈ FA[R], the set Win(ϕ) has a computable cutoff wrt.


**Table 1.** Summary of results. Our contributions are highlighted in **bold**.

<sup>∗</sup>We show, however, that there is no cutoff.

(Ns, <sup>N</sup>e, <sup>N</sup>se), then we know that Synth(F[R], <sup>N</sup>s, <sup>N</sup>e, <sup>N</sup>se) is decidable, as it can be reduced to a finite number of simple synthesis problems over a finite alphabet. The latter can be solved, e.g., using attractor-based backward search (cf. [42]). This is how we will show decidability of Synth(FO[∼], <sup>N</sup>, {ke}, {kse}) for all <sup>k</sup>e, kse <sup>∈</sup> <sup>N</sup>.

Our contributions are summarized in Table 1. Note that known satisfiability results for data logic apply to our logic, as processes can be simulated by treating every <sup>θ</sup> <sup>∈</sup> <sup>T</sup> as an ordinary letter. Let us first state undecidability of the general synthesis problem, which motivates the study of other FO fragments.

**Theorem 5.** The problem Synth(FO<sup>2</sup>[∼, <, +1], <sup>0</sup>, <sup>0</sup>, <sup>N</sup>) is undecidable.

Proof (sketch). We adapt the proof from [16, 17] reducing the halting problem for 2-counter machines. We show that their encoding can be expressed in our logic, even if we restrict it to two variables, and can also be adapted to the asynchronous setting.

## **4 FO[***∼***] and Parameterized Vector Games**

Due to the undecidability result of Theorem 5, one has to switch to other fragments of first-order logic. We will henceforth focus on the logic FO[∼] and establish some important properties, such as a normal form, that will allow us to deduce a couple of results, both positive and negative.

## **4.1 Satisfiability and Normal Form for FO[***∼***]**

We first show that FO[∼] logic essentially allows one to count letters in a class up to some threshold, and to count such classes up to some other threshold. Let <sup>B</sup> <sup>∈</sup> <sup>N</sup> and ∈ {0,...,B}<sup>A</sup>. Intuitively, (a) imposes a constraint on the number of occurrences of a in a class. We first define an FOA[∼]-formula ψB,(y) verifying that, in the class defined by y, the number of occurrences of each letter a ∈ A, counted up to B, is (a):

$$\psi\_{B,\ell}(y) = \bigwedge\_{\substack{a \in A \mid \\ \ell(a) < B}} \exists^{=\ell(a)} z. (y \sim z \land a(z)) \land \bigwedge\_{\substack{a \in A \mid \\ \ell(a) = B}} \exists^{\geq \ell(a)} z. (y \sim z \land a(z))$$

**Theorem 6 (normal form for FO[***∼***]).** Let ϕ ∈ FOA[∼] be a sentence. There is a computable <sup>B</sup> <sup>∈</sup> <sup>N</sup> such that <sup>ϕ</sup> is effectively equivalent to a disjunction of conjunctions of formulas of the form <sup>∃</sup>my. θ(y) ∧ ψB,(y) where ∈ {≥, =}, <sup>m</sup> <sup>∈</sup> <sup>N</sup>, <sup>θ</sup> <sup>∈</sup> <sup>T</sup>, and ∈ {0,...,B}A.

The normal form can be obtained using known normal-form constructions [23,41] for general FO logic [2], or using Ehrenfeucht-Fra¨ıss´e games [39], or using a direct inductive transformation in the spirit of [23].

Example 7. Recall the formula <sup>ϕ</sup><sup>4</sup> <sup>=</sup> <sup>∀</sup>x.∃=2y.(<sup>x</sup> <sup>∼</sup> <sup>y</sup> <sup>∧</sup> <sup>a</sup>(y)) ⇐⇒ <sup>∃</sup>=2y.(<sup>x</sup> <sup>∼</sup> y ∧ d(y)) ∈ FOA[∼] from Example 2, over A<sup>s</sup> = {a, b} and A<sup>e</sup> = {c, d}. An equivalent formula in normal form is ϕ <sup>4</sup> = <sup>θ</sup>∈T, ∈<sup>Z</sup> <sup>∃</sup>=0y. θ(y)∧ψ3,(y) where <sup>Z</sup> is the set of vectors ∈ {0,..., <sup>3</sup>}<sup>A</sup> such that (a)=2 <sup>=</sup> (d) or (d)=2 <sup>=</sup> (a). The formula indeed says that there is no class with =2 occurrences of a and =2 occurrences of <sup>d</sup> or vice versa, which is equivalent to <sup>ϕ</sup>4. -

Thanks to the normal form, it is sufficient to test finitely many structures to determine whether a given formula is satisfiable:

**Corollary 8.** The satisfiability problem for FO[∼] over data words is decidable. Moreover, every satisfiable FOA[∼] formula has a finite model.

Note that the satisfiability problem for FO<sup>2</sup>[∼] is already NEXPTIME-hard, due to NEXPTIME-hardness for two-variable logic with unary relations only [14, 20,22]. In fact, it is NEXPTIME-complete due to the upper bound for FO<sup>2</sup>[∼, <] [5]. It is worth mentioning that two-variable logic with one equivalence relation on arbitrary structures also has the finite-model property [32].

#### **4.2 From Synthesis to Parameterized Vector Games**

Exploiting the normal form for FOA[∼], we now present a reduction of the synthesis problem to a strictly turn-based two-player game. This game is conceptually simpler and easier to reason about. The reduction works in both directions, which will allow us to derive both decidability and undecidability results.

Note that, given a formula ϕ ∈ FOA[∼] (which we suppose to be in normal form with threshold B), the order of letters in an execution does not matter. Thus, given some P, a reasonable strategy for Environment would be to just "wait and see". More precisely, it does not put Environment into a worse position if, given the current execution w ∈ Σ∗, it lets the System execute as many actions as it wants in terms of a word u ∈ Σ<sup>∗</sup> <sup>s</sup> . Due to the fairness assumption, System would be able to execute all the letters from u anyway. Environment can even require System to play a word <sup>u</sup> such that (P, wu) <sup>|</sup><sup>=</sup> <sup>ϕ</sup>. If System is not able to produce such a word, Environment can just sit back and do nothing. Conversely, upon wu satisfying ϕ, Environment has to be able to come up with a word v ∈ Σ<sup>∗</sup> <sup>e</sup> such that (P, wuv) |<sup>=</sup> <sup>ϕ</sup>. This leads to a turn-based game in which System and Environment play in strictly alternate order and have to provide a satisfying and, respectively, falsifying execution.

In a second step, we can get rid of process identifiers: According to our normal form, all we are interested in is the number of processes that agree on their letters counted up to threshold B. That is, a finite execution can be abstracted as a configuration <sup>C</sup> : <sup>L</sup> <sup>→</sup> <sup>N</sup><sup>T</sup> where <sup>L</sup> <sup>=</sup> {0,...,B}A. For <sup>∈</sup> <sup>L</sup> and C()=(ns, ne, nse), n<sup>θ</sup> is the number of processes of type θ whose letter count up to threshold B corresponds to . We can also say that contains n<sup>θ</sup> tokens of type θ. If it is System's turn, it will pick some pairs (, ) and move some tokens of type θ ∈ {s,se} from to , provided (a) ≤ (a) for all a ∈ A<sup>s</sup> and (a) = (a) for all a ∈ Ae. This actually corresponds to adding more system letters in the corresponding processes. The Environment proceeds analogously.

Finally, the formula <sup>ϕ</sup> naturally translates to an acceptance condition F ⊆ <sup>C</sup><sup>L</sup> over configurations, where C is the set of local acceptance conditions, which are of the form ( <sup>s</sup>n<sup>s</sup> , <sup>e</sup>n<sup>e</sup> , sense) where s, e, se ∈ {=, ≥} and <sup>n</sup>s, ne, nse <sup>∈</sup> <sup>N</sup>.

We end up with a turn-based game in which, similarly to a VASS game [1,6, 10,27,38], System and Environment move tokens along vectors from L. Note that, however, our games have a very particular structure so that undecidability for VASS games does not carry over to our setting. Moreover, existing decidability results do not allow us to infer our cutoff results below.

In the following, we will formalize parameterized vector games.

**Definition 9.** A parameterized vector game (or simply game) is given by a triple <sup>G</sup> = (A, B, <sup>F</sup>) where <sup>A</sup> <sup>=</sup> <sup>A</sup><sup>s</sup> <sup>A</sup><sup>e</sup> is the finite alphabet, <sup>B</sup> <sup>∈</sup> <sup>N</sup> is a bound, and, letting <sup>L</sup> <sup>=</sup> {0,...,B}<sup>A</sup>, F ⊆ <sup>C</sup><sup>L</sup> is a finite set called acceptance condition.

Locations. Let <sup>0</sup> be the location such that 0(a) = 0 for all a ∈ A. For ∈ L and a ∈ A, we define + a by ( + a)(b) = (b) for b = a and ( + a)(b) = max{(a)+1, B} otherwise. This is extended for all u ∈ A<sup>∗</sup> and a ∈ A by + ε = and + ua = ( + u) + a. By ⟪w⟫, we denote the location <sup>0</sup> + w.

Configurations. As explained above, a configuration of G is a mapping C : L → <sup>N</sup><sup>T</sup>. Suppose that, for <sup>∈</sup> <sup>L</sup> and <sup>θ</sup> <sup>∈</sup> <sup>T</sup>, we have <sup>C</sup>()=(ns, ne, nse). Then, we let C(, θ) refer to nθ. By Conf , we denote the set of all configurations.

Transitions. A system transition (respectively environment transition) is a mapping <sup>τ</sup> : <sup>L</sup>×<sup>L</sup> <sup>→</sup> (N×{0}×N) (respectively <sup>τ</sup> : <sup>L</sup>×<sup>L</sup> <sup>→</sup> ({0}×N×N)) such that, for all (, ) ∈ L×L with τ (, ) = (0, 0, 0), there is a word w ∈ A<sup>∗</sup> <sup>s</sup> (respectively w ∈ A<sup>∗</sup> <sup>e</sup> ) such that = + w. Let T<sup>s</sup> denote the set of system transitions, T<sup>e</sup> the set of environment transitions, and T = T<sup>s</sup> ∪ T<sup>e</sup> the set of all transitions.

For <sup>τ</sup> <sup>∈</sup> <sup>T</sup>, let the mappings out <sup>τ</sup> , in<sup>τ</sup> : <sup>L</sup> <sup>→</sup> <sup>N</sup><sup>T</sup> be defined by out <sup>τ</sup> () = -<sup>∈</sup><sup>L</sup> <sup>τ</sup> (, ) and in<sup>τ</sup> () = -<sup>∈</sup><sup>L</sup> <sup>τ</sup> ( , ) (recall that sum is component-wise). We say that τ ∈ T is applicable at C ∈ Conf if, for all ∈ L, we have out <sup>τ</sup> () ≤ C() (component-wise). Abusing notation, we let τ (C) denote the configuration C defined by C () = C() − out <sup>τ</sup> () + in<sup>τ</sup> () for all ∈ L. Moreover, for τ (, )=(ns, ne, nse) and <sup>θ</sup> <sup>∈</sup> <sup>T</sup>, we let <sup>τ</sup> (, , θ) refer to nθ.

Plays. Let C ∈ Conf . We write C |= F if there is κ ∈ F such that, for all ∈ L, we have C() |= κ() (in the expected manner). A C-play, or simply play, is a finite sequence π = C0τ1C1τ2C<sup>2</sup> ...τnC<sup>n</sup> alternating between configurations and transitions (with n ≥ 0) such that C<sup>0</sup> = C and, for all i ∈ {1,...,n}, C<sup>i</sup> = τi(Ci−1) and


The set of all C-plays is denoted by Plays<sup>C</sup> .

Strategies. A C-strategy for System is a partial mapping f : Plays<sup>C</sup> → T<sup>s</sup> such that f(C) is defined and, for all π = C0τ1C<sup>1</sup> ...τiC<sup>i</sup> ∈ Plays<sup>C</sup> with τ = f(π) defined, we have that τ is applicable at C<sup>i</sup> and τ (Ci) |= F. Play π = C0τ1C<sup>1</sup> ...τnC<sup>n</sup> is


We say that f is winning for System (from C) if all f-compatible f-maximal Cplays are winning. Finally, C is winning if there is a C-strategy that is winning. Note that, given an initial configuration C, we deal with an acyclic finite reachability game so that, if there is a winning C-strategy, then there is a positional one, which only depends on the last configuration.

For *<sup>k</sup>* <sup>∈</sup> <sup>N</sup><sup>T</sup>, let <sup>C</sup>*<sup>k</sup>* denote the configuration that maps <sup>0</sup> to *<sup>k</sup>* and all other locations to (0, <sup>0</sup>, 0). We set Win(G) = {*<sup>k</sup>* <sup>∈</sup> <sup>N</sup><sup>T</sup> <sup>|</sup> <sup>C</sup>*<sup>k</sup>* is winning for System}.

**Definition 10 (game problem).** For sets <sup>N</sup>s, <sup>N</sup>e, <sup>N</sup>se <sup>⊆</sup> <sup>N</sup>, the game problem is given as follows:


One can show that parameterized vector games are equivalent to the synthesis problem in the following sense:

**Lemma 11.** For every sentence ϕ ∈ FOA[∼], there is a parameterized vector game G = (A, B, F) such that Win(ϕ) = Win(G). Conversely, for every parameterized vector game G = (A, B, F), there is a sentence ϕ ∈ FOA[∼] such that Win(G) = Win(ϕ). Both directions are effective.

Example 12. To illustrate parameterized vector games and the reduction from the synthesis problem, consider the formula ϕ <sup>4</sup> = <sup>θ</sup>∈T, ∈<sup>Z</sup> <sup>∃</sup>=0y. θ(y)∧ψ3,(y) in normal form from Example 7. For simplicity, we assume that A<sup>s</sup> = {a} and <sup>A</sup><sup>e</sup> <sup>=</sup> {d}. That is, <sup>Z</sup> is the set of vectors ⟪a<sup>i</sup> <sup>d</sup><sup>j</sup>⟫ <sup>∈</sup> <sup>L</sup> <sup>=</sup> {0,..., <sup>3</sup>}{a,d} such that i = 2 = j or j = 2 = i. Figure 2 illustrates a couple of configurations <sup>C</sup>0,...,C<sup>5</sup> : <sup>L</sup> <sup>→</sup> <sup>N</sup><sup>T</sup>. The leftmost location in a configuration is 0, the rightmost

**Fig. 2.** A play of a parameterized vector game

location ⟪a<sup>3</sup>d<sup>3</sup>⟫, the topmost one ⟪a<sup>3</sup>⟫, and the one at the bottom ⟪d<sup>3</sup>⟫. Selfloops have been omitted, and locations from Z have gray background and a dashed border.

Towards an equivalent game G = (A, 3, F), it remains to determine the acceptance condition F. Recall that ϕ <sup>4</sup> says that every class contains two occurrences of a iff it contains two occurrences of d. This is reflected by the acceptance condition F = {κ} where κ() = (=0 , =0 , =0) for all ∈ Z and κ()=(≥0 , ≥0 , ≥0) for all ∈ L \ Z. With this, a configuration is accepting iff no token is on a location from Z (a gray location).

We can verify that Win(G) = Win(ϕ <sup>4</sup>) = <sup>N</sup>×0×N. In <sup>G</sup>, a uniform winning strategy <sup>f</sup> for System that works for all <sup>P</sup> with <sup>P</sup><sup>e</sup> <sup>=</sup> <sup>∅</sup> proceeds as follows: System first awaits an Environment's move and then moves each token upwards as many locations as Environment has moved it downwards. Figure 2 illustrates an f-maximal C(6,0,0)-play that is winning for System. We note that f is a "compressed" version of the winning strategy presented in Example 4, as System makes her moves only when really needed. -

## **5 Results for FO[***∼***] via Parameterized Vector Games**

In this section, we present our results for the synthesis problem for FO[∼], which we obtain showing corresponding results for parameterized vector games. In particular, we show that (FO[∼], <sup>0</sup>, <sup>0</sup>, <sup>N</sup>) and (FO[∼], <sup>N</sup>, <sup>N</sup>, <sup>0</sup>) do not have a cutoff, whereas (FO[∼], <sup>N</sup>, {ke}, {kse}) has a cutoff for all <sup>k</sup>e, kse <sup>∈</sup> <sup>N</sup>. Finally, we prove that Synth(FO[∼], <sup>0</sup>, <sup>0</sup>, <sup>N</sup>) is, in fact, undecidable.

**Lemma 13.** There is a game G = (A, B, F) such that Win(G) does not have a cutoff wrt. (0, 0, N).

Proof. We let A<sup>s</sup> = {a} and A<sup>e</sup> = {b}, as well as B = 2. For k ∈ {0, 1, 2}, define the local acceptance conditions <sup>=</sup><sup>k</sup> = (=0 , =0 , <sup>=</sup>k) and <sup>≥</sup><sup>k</sup> = (=0 , =0 , <sup>≥</sup>k). Set

**Fig. 3.** Acceptance conditions for a game with no cutoff wrt. (0, 0, N)

<sup>1</sup> <sup>=</sup> ⟪a⟫, <sup>2</sup> <sup>=</sup> ⟪ab⟫, <sup>3</sup> <sup>=</sup> ⟪a<sup>2</sup>b⟫, and <sup>4</sup> <sup>=</sup> ⟪a<sup>2</sup>b<sup>2</sup>⟫. For <sup>k</sup>0,...,k<sup>4</sup> ∈ {0, <sup>1</sup>, <sup>2</sup>} and 0, . . . , <sup>4</sup> ∈ {=, ≥}, let [<sup>0</sup> <sup>k</sup><sup>0</sup> , <sup>1</sup> <sup>k</sup><sup>1</sup> , <sup>2</sup> <sup>k</sup><sup>2</sup> , <sup>3</sup> <sup>k</sup><sup>3</sup> , <sup>4</sup> <sup>k</sup>4] denote <sup>κ</sup> <sup>∈</sup> <sup>C</sup><sup>L</sup> where <sup>κ</sup>(i)=(<sup>i</sup> <sup>k</sup>i) for all <sup>i</sup> ∈ {0,..., <sup>4</sup>} and <sup>κ</sup>( )=(<sup>=</sup>0) for ∈ { / 0,...,4}. Finally,

$$\mathcal{F} = \left\{ \begin{bmatrix} \geq 0, \, \text{"} \, \text{"} \, 0, \, \text{"} \, 0, \, \text{"} \, 0 \\ \left[ \geq 0, \, \text{"} \, 1, \, \text{"} \right], \, \text{"} \, 0, \, \text{"} \, 0, \, \text{"} \, \text{"} \, 0 \right\} \left[ \left[ \geq 0, \, \text{"} \, 0, \, \text{"} \right], \, \text{"} \, 0, \, \text{"} \right] \right\} \cup K\_{\text{e}}.$$

where K<sup>e</sup> = {κ | ∈ L such that (b) > (a)} with κ( )=(≥1) if = , and κ( )=(≥0) otherwise. This is illustrated in Figure 3.

There is a winning strategy for System from any initial configuration of size 2n: Move two tokens from <sup>0</sup> to 1, wait until Environment sends them both to 2, then move them to 3, wait until they are moved to 4, then repeat with two new tokens from <sup>0</sup> until all the tokens are removed from 0, and Environment cannot escape F anymore. However, one can check that there is no winning strategy for initial configurations of odd size.

**Lemma 14.** There is a game G = (A, B, F) such that Win(G) does not have a cutoff wrt. (N, N, 0).

Proof. We define G such that System wins only if she has at least as many processes as Environment. Let A<sup>s</sup> = {a}, A<sup>e</sup> = {b}, and B = 2. As there are no shared processes, we can safely ignore locations with a letter from both System and Environment. We set F = {κ1, κ2, κ3, κ4} where

$$\begin{array}{llll} \kappa\_1(\{\lbrack a \rbrack \}) = (=1, \mathrel{=0}, \mathrel{=0}) & \kappa\_2(\{\lbrack a \rbrack \}) = (=1, \mathrel{=0}, \mathrel{=0}) & \kappa\_3(\{\lvert a \rbrack \}) = (=0, \mathrel{=0}, \mathrel{=0}) \\ \kappa\_1(\{\lbrack b \rbrack \}) = (=0, \mathrel{=0}, \mathrel{=0}) & \kappa\_2(\{\lbrack b \rbrack \}) = (=0, \geq 2, \mathrel{=0}) & \kappa\_3(\{\lvert b \rbrack \}) = (=0, \geq 1, \mathrel{=0}), \\ \kappa\_4(\ell\_0) = (=0, \mathrel{=0}, \mathrel{=0}), \text{ and } \kappa\_i(\ell') = (\geq 0, \geq 0, \mathrel{=0}) & \text{for all other } \ell' \in L \text{ and} \\ \kappa \in \{1, 2, 3, 4\}. \end{array}$$

We now turn to the case where the number of processes that can be triggered by Environment is bounded. Note that similar restrictions are imposed in other settings to get decidability, such as limiting the environment to a finite (Boolean) domain [16] or restricting to one environment process [3,18]. We obtain decidability of the synthesis problem via a cutoff construction:

**Theorem 15.** Given <sup>k</sup>e, kse <sup>∈</sup> <sup>N</sup>, every game <sup>G</sup> = (A, B, <sup>F</sup>) has a cutoff wrt. (N, {ke}, {kse}). More precisely: Let <sup>K</sup> be the largest constant that occurs in <sup>F</sup>. Moreover, let Max = (k<sup>e</sup> <sup>+</sup>kse)·|Ae|·<sup>B</sup> and <sup>N</sup><sup>ˆ</sup> <sup>=</sup> <sup>|</sup>L<sup>|</sup> Max+1 ·K. Then, (N,k <sup>ˆ</sup> <sup>e</sup>, kse) is a cutoff of Win(G) wrt. (N, {ke}, {kse}).

Proof. We will show that, for all <sup>N</sup> <sup>≥</sup> <sup>N</sup>ˆ,

(N, ke, kse) ∈ Win(G) ⇐⇒ (N + 1, ke, kse) ∈ Win(G).

The main observation is that, when C contains more than K tokens in a given ∈ L, adding more tokens in will not change whether C |= F. Given C, C ∈ Conf , we write C <<sup>e</sup> C if C = C and there is τ ∈ T<sup>e</sup> such that τ (C) = C . Note that the length d of a chain C<sup>0</sup> <<sup>e</sup> C<sup>1</sup> <<sup>e</sup> ...<<sup>e</sup> C<sup>d</sup> is bounded by Max . In other words, Max is the maximal number of transitions that Environment can do in a play. For all d ∈ {0,..., Max}, let Conf<sup>d</sup> be the set of configurations C ∈ Conf such that the longest chain in (Conf , <e) starting from C has length d.

Claim. Suppose that C ∈ Conf<sup>d</sup> and ∈ L such that C()=(N,ne, nse) with N ≥ |L| <sup>d</sup>+1 · <sup>K</sup> and <sup>n</sup>e, nse <sup>∈</sup> <sup>N</sup>. Set <sup>D</sup> <sup>=</sup> <sup>C</sup>[ → (<sup>N</sup> + 1, ne, nse)]. Then,

C is winning for System ⇐⇒ D is winning for System.

To show the claim, we proceed by induction on <sup>d</sup> <sup>∈</sup> <sup>N</sup>, which is illustrated in Figure 4. In each implication, we distinguish the cases d = 0 and d ≥ 1. For the latter, we assume that equivalence holds for all values strictly smaller than d.

For τ ∈ T<sup>s</sup> and , ∈ L, we let τ [(, ,s)++] denote the transition η ∈ T<sup>s</sup> given by η(1, 2, e) = τ (1, 2, e) = 0, η(1, 2,se) = τ (1, 2,se), η(1, 2,s) = τ (1, 2,s) + 1 if (1, 2)=(, ), and η(1, 2,s) = τ (1, 2,s) if (1, 2) = (, ). We define τ [(, ,s)– –] similarly (provided τ (, ,s) ≥ 1).

=⇒: Let f be a winning strategy for System from C ∈ Confd. Let τ = f(C) and C = τ (C). Note that C |= F. Since C(,s) = N ≥ |L| <sup>d</sup>+1 · <sup>K</sup>, there is ∈ L such that + w = for some w ∈ A<sup>∗</sup> <sup>s</sup> and C ( ,s) = N ≥ |L| <sup>d</sup> · <sup>K</sup>.

We show that D = C[ → (N +1, ne, nse)] is winning for System by exhibiting a corresponding winning strategy g from D that will carefully control the position of the additional token. First, set g(D) = η where η = τ [(, ,s)++]. Let D = η (D). We obtain D ( ,s) = N + 1. Note that, since N ≥ K, the acceptance condition F cannot distinguish between C and D . Thus, we have D |= F.


**Fig. 4.** Induction step in the cutoff construction

⇐=: Suppose g is a winning strategy for System from D. Thus, for η = g(D) and D = η (D), we have D |= F. Recall that D(,s) ≥ (|L| <sup>d</sup>+1 · <sup>K</sup>) + 1. We distinguish two cases:


Let C = τ (C). Since D |= F, one obtains C |= F.


This concludes the proof of the claim and, therefore, of Theorem 15.

**Corollary 16.** Let <sup>k</sup>e, kse <sup>∈</sup> <sup>N</sup> be the number of environment and the number of mixed processes, respectively. The problems Game(N, {ke}, {kse}) and Synth(FO[∼], <sup>N</sup>, {ke}, {kse}) are decidable.

In particular, by Theorem 15, the game problem can be reduced to an exponential number of acyclic finite-state games whose size (and hence the time complexity for determining the winner) is exponential in the cutoff and, therefore, doubly exponential in the size of the alphabet, the bound B, and the fixed number of processes that are controllable by the environment.

## **Theorem 17.** Game(0, <sup>0</sup>, <sup>N</sup>) and Synth(FO[∼], <sup>0</sup>, <sup>0</sup>, <sup>N</sup>) are undecidable.

Proof. We provide a reduction from the halting problem for 2-counter machines (2CM) to Game(0, 0, N). A 2CM M = (Q, Δ, c1, c2, q0, qh) has two counters, c<sup>1</sup> and c2, a finite set of states Q, and a set of transitions Δ ⊆ Q × Op × Q where Op = {ci++ , ci– – , ci==0 | i ∈ {1, 2}}. Moreover, we have an initial state q<sup>0</sup> ∈ Q and a halting state q<sup>h</sup> ∈ Q. A configuration of M is a triple <sup>γ</sup> = (q, ν1, ν2) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> <sup>N</sup> <sup>×</sup> <sup>N</sup> giving the current state and the current respective counter values. The initial configuration is γ<sup>0</sup> = (q0, 0, 0) and the set of halting configurations is <sup>F</sup> <sup>=</sup> {q<sup>h</sup>} × <sup>N</sup> <sup>×</sup> <sup>N</sup>. For <sup>t</sup> <sup>∈</sup> <sup>Δ</sup>, configuration (q , ν 1, ν <sup>2</sup>) is a (t-)successor of (q, ν1, ν2), written (q, ν1, ν2) <sup>t</sup> (q , ν 1, ν <sup>2</sup>), if there is i ∈ {1, 2} such that ν <sup>3</sup>−<sup>i</sup> <sup>=</sup> <sup>ν</sup><sup>3</sup>−<sup>i</sup> and one of the following holds: (i) <sup>t</sup> = (q, <sup>c</sup>i++, q ) and ν <sup>i</sup> = ν<sup>i</sup> + 1, or (ii) t = (q, ci– –, q ) and ν <sup>i</sup> = ν<sup>i</sup> − 1, or (iii) t = (q, ci==0, q ) and ν<sup>i</sup> = ν <sup>i</sup> = 0. A run of M is a (finite or infinite) sequence γ<sup>0</sup> <sup>t</sup><sup>1</sup> γ<sup>1</sup> <sup>t</sup><sup>2</sup> ... . The 2CM halting problem asks whether there is a run reaching a configuration in F. It is known to be undecidable [34].

We fix a 2CM M = (Q, Δ, c1, c2, q0, qh). Let A<sup>s</sup> = Q ∪ Δ ∪ {a1, a2} and A<sup>e</sup> = {b} with a1, a2, and b three fresh symbols. We consider the game G = (A, B, F) with <sup>A</sup> <sup>=</sup> <sup>A</sup>sAe, <sup>B</sup> = 4, and <sup>F</sup> defined below. Let <sup>L</sup> <sup>=</sup> {0,...,B}<sup>A</sup>. Since there are only processes shared by System and Environment, we alleviate notation and consider that a configuration is simply a mapping <sup>C</sup> : <sup>L</sup> <sup>→</sup> <sup>N</sup>. From now on, to avoid confusion, we refer to configurations of the 2CM M as M-configurations, and to configurations of G as G-configurations.

Intuitively, every valid run of M will be encoded as a play in G, and the acceptance condition will enforce that, if a player in G deviates from a valid play, then she will lose immediately. At any point in the play, there will be at most one process with only a letter from Q played, which will represent the current state in the simulated 2CM run. Similarly, there will be at most one process with only a letter from Δ to represent what transition will be taken next. Finally, the value of counter c<sup>i</sup> will be encoded by the number of processes with exactly two occurrences of a<sup>i</sup> and two occurrences of b (i.e., C(⟪a<sup>2</sup> <sup>i</sup> b<sup>2</sup>⟫)).

To increase counter ci, the players will move a new token to ⟪a<sup>2</sup> <sup>i</sup> b<sup>2</sup>⟫, and to decrease it, they will move, together, a token from ⟪a<sup>2</sup> <sup>i</sup> b<sup>2</sup>⟫ to ⟪a<sup>4</sup> <sup>i</sup> b<sup>4</sup>⟫. Observe that, if c<sup>i</sup> has value 0, then C(⟪a<sup>2</sup> <sup>i</sup> b<sup>2</sup>⟫) = 0 in the corresponding configuration of the game. As expected, it is then impossible to simulate the decrement of ci. Environment's only role is to acknowledge System's actions by playing its (only) letter when System simulates a valid run. If System tries to cheat, she loses immediately.

Encoding an M-configuration. Let us be more formal. Suppose γ = (q, ν1, ν2) is an M-configuration and C a G-configuration. We say that C encodes γ if

**–** C(⟪q⟫) = 1, C(⟪a<sup>2</sup> <sup>1</sup>b2⟫) = ν1, C(⟪a<sup>2</sup> <sup>2</sup>b2⟫) = ν2, **–** <sup>C</sup>() <sup>≥</sup> 0 for all ∈ {0}∪{⟪qˆ2b2⟫, ⟪<sup>t</sup> <sup>2</sup>b2⟫, ⟪a<sup>4</sup> <sup>i</sup> <sup>b</sup>4⟫ <sup>|</sup> <sup>q</sup><sup>ˆ</sup> <sup>∈</sup> Q, t <sup>∈</sup> Δ, i ∈ {1, <sup>2</sup>}}, **–** C() = 0 for all other ∈ L.

We then write <sup>γ</sup> <sup>=</sup> <sup>m</sup>(C). Let <sup>C</sup>(γ) be the set of <sup>G</sup>-configurations <sup>C</sup> that encode <sup>γ</sup>. We say that a <sup>G</sup>-configuration <sup>C</sup> is valid if <sup>C</sup> <sup>∈</sup> <sup>C</sup>(γ) for some <sup>γ</sup>.

Simulating a transition of M. Let us explain how we go from a G-configuration encoding γ to a G-configuration encoding a successor M-configuration γ . Observe that System cannot change by herself the M-configuration encoded. If, for instance, she tries to change the current state q, she might move one process from <sup>0</sup> to ⟪q ⟫, but then the G-configuration is not valid anymore. We need to move the process in ⟪q⟫ into ⟪q<sup>2</sup>b<sup>2</sup>⟫ and this requires the cooperation of Environment.

Assume that the game is in configuration C encoding γ = (q, ν1, ν2). System will pick a transition t starting in state q, say, t = (q, c1++, q ). From configuration C, System will go to the configuration C<sup>1</sup> defined by C1(⟪t⟫) = 1, C1(⟪a1⟫) = 1, and C1() = C() for all other ∈ L.

If the transition t is correctly chosen, Environment will go to a configuration C<sup>2</sup> defined by C2(⟪q⟫) = 0, C2(⟪qb⟫) = 1, C2(⟪t⟫) = 0, C2(⟪tb⟫) = 1, C2(⟪a1⟫) = 0, C2(⟪a1b⟫) = 1 and, for all other ∈ L, C2() = C1(). This means that Environment moves processes in locations ⟪t⟫, ⟪q⟫, ⟪a1⟫ to locations ⟪tb⟫, ⟪qb⟫, ⟪a1b⟫, respectively.

To finish the transition, System will now move a process to the destination state q of t, and go to configuration C<sup>3</sup> defined by C3(⟪q ⟫) = 1, C3(⟪tb⟫) = 0, C3(⟪t <sup>2</sup>b⟫) = 1, C3(⟪qb⟫) = 0, C3(⟪q<sup>2</sup>b⟫) = 1, C3(⟪a1b⟫) = 0, C3(⟪a<sup>2</sup> <sup>1</sup>b⟫) = 1, and C3() = C2() for all other ∈ L.

Finally, Environment moves to configuration C<sup>4</sup> given by C4(⟪t <sup>2</sup>b⟫) = 0, C4(⟪t <sup>2</sup>b<sup>2</sup>⟫) = C3(⟪t <sup>2</sup>b<sup>2</sup>⟫) + 1, C4(⟪q<sup>2</sup>b⟫) = 0, C4(⟪q<sup>2</sup>b<sup>2</sup>⟫) = C3(⟪q<sup>2</sup>b<sup>2</sup>⟫) + 1, C4(⟪a<sup>2</sup> <sup>1</sup>b⟫) = 0, C4(⟪a<sup>2</sup> <sup>1</sup>b<sup>2</sup>⟫) = C3(⟪a<sup>2</sup> <sup>1</sup>b<sup>2</sup>⟫) + 1, and C4() = C3() for all other <sup>∈</sup> <sup>L</sup>. Observe that <sup>C</sup><sup>4</sup> <sup>∈</sup> <sup>C</sup>((q , ν<sup>1</sup> + 1, ν2)).

Other types of transitions will be simulated similarly. To force System to start the simulation in γ0, and not in any M-configuration, the configurations C such that C(⟪q<sup>2</sup> <sup>0</sup>b<sup>2</sup>⟫) = 0 and <sup>C</sup>(⟪q⟫) = 1 for <sup>q</sup> <sup>=</sup> <sup>q</sup><sup>0</sup> are not valid, and will be losing for System.

Acceptance condition. It remains to define F in a way that enforces the above sequence of G-configurations. Let L- <sup>=</sup> {0}∪{⟪a<sup>2</sup> <sup>i</sup> b<sup>2</sup>⟫, ⟪a<sup>4</sup> <sup>i</sup> <sup>b</sup><sup>4</sup>⟫ <sup>|</sup> <sup>i</sup> ∈ {1, <sup>2</sup>}} ∪ {⟪q<sup>2</sup>b<sup>2</sup>⟫ <sup>|</sup> <sup>q</sup> <sup>∈</sup> <sup>Q</sup>}∪{⟪<sup>t</sup> <sup>2</sup>b<sup>2</sup>⟫ <sup>|</sup> <sup>t</sup> <sup>∈</sup> <sup>Δ</sup>} be the set of elements in <sup>L</sup> whose values do not affect the acceptance of the configuration. By [<sup>1</sup> <sup>1</sup> n1,...,<sup>k</sup> <sup>k</sup> nk], we denote <sup>κ</sup> <sup>∈</sup> <sup>C</sup><sup>L</sup> such that <sup>κ</sup>(i)=( <sup>i</sup>ni) for i ∈ {1,...,k} and κ() = (=0) for all <sup>∈</sup> <sup>L</sup> \ {1,...,<sup>k</sup>}. Moreover, for a set of locations <sup>L</sup><sup>ˆ</sup> <sup>⊆</sup> <sup>L</sup>, we let <sup>L</sup><sup>ˆ</sup> <sup>≥</sup> 0 stand for "( <sup>≥</sup> 0) for all <sup>∈</sup> <sup>L</sup>ˆ".

First, we force Environment to play only in response to System by making System win as soon as there is a process where Environment has played more letters than System (see Condition (d) in Table 2).

If γ is not halting, the configurations in C(γ) will not be winning for System. Hence, System will have to move to win (Condition (a)).

**Table 2.** Acceptance conditions for the game simulating a 2CM


The first transition chosen by System must start from the initial state of M. This is enforced by Condition (b).

Once System has moved, Environment will move other processes to leave accepting configurations. The only possible move for her is to add b on a process in locations ⟪q⟫, ⟪t⟫, and ⟪ai⟫, if t is a transition incrementing counter c<sup>i</sup> (respectively ⟪a<sup>3</sup> <sup>i</sup> b2⟫ if t is a transition decrementing counter ci). All other G-configurations accessible by Environment from already defined accepting configurations are winning for System, as established in Condition (e).

System can now encode the successor configuration of M, according to the chosen transition, by moving a process to the destination state of the transition (see Condition (c)).

Finally, Environment makes the necessary transitions for the configuration to be a valid G-configuration. If she deviates, System wins (see Condition (f)).

If Environment reaches a configuration in <sup>C</sup>(γ) for <sup>γ</sup> <sup>∈</sup> <sup>F</sup>, System can win by moving the process in ⟪qh⟫ to ⟪q<sup>2</sup> <sup>h</sup>⟫. From there, all the configurations reachable by Environment are also winning for System:

$$\mathcal{F}\_F = \left\{ [\![q\_h^2\!]\!] = 1, L\_{\check{\vee}} \ge 0 \right\}, \left[ \![q\_h^2 b \!] \!] = 1, L\_{\check{\vee}} \ge 0 \right], \left[ \![q\_h^2 b^2 \!] \!] = 1, L\_{\check{\vee}} \ge 0 \right].$$

Finally, the acceptance condition is given by

$$\mathcal{F} = \bigcup\_{\ell \in L\_{t \le q}} \mathcal{F}\_{\ell} \cup \bigcup\_{t = (q\_0, \text{op}, q') \in \Delta} \mathcal{F}\_t \cup \bigcup\_{t = (q, \text{op}, q') \in \Delta} (\mathcal{F}\_{(q, t)} \cup \mathcal{F}\_{(q, t)}^{\mathbf{e}} \cup \mathcal{F}\_{(q, t, q')}^{\mathbf{e}}) \cup \mathcal{F}\_{\mathbf{f} \dots \mathbf{f}}^{\mathbf{e}}) \cup \mathcal{F}\_{\mathbf{f}} \dots$$

Note that a correct play can end in three different ways: either there is a process in ⟪qh⟫ and System moves it to ⟪q<sup>2</sup> <sup>h</sup>⟫, or System has no transition to pick, or there are not enough processes in <sup>0</sup> for System to simulate a new transition. Only the first kind is winning for System.

We can show that there is an accepting run in M iff there is some k such that System has a winning C(0,0,k)-strategy for G.

## **6 Conclusion**

There are several questions that we left open and that are interesting in their own right due to their fundamental character. Moreover, in the decidable cases, it will be worthwhile to provide tight bounds on cutoffs and the algorithmic complexity of the decision problem. Like in [7,15,16,30,31], our strategies allow the system to have a global view of the whole program run executed so far. However, it is also perfectly natural to consider uniform local strategies where each process only sees its own actions and possibly those that are revealed according to some causal dependencies. This is, e.g., the setting considered in [3,18] for a fixed number of processes and in [25] for parameterized systems over ring architectures.

Moreover, we would like to study a parameterized version of the control problem [35] where, in addition to a specification, a program in terms of an arena is already given but has to be controlled in a way such that the specification is satisfied. Finally, our synthesis results crucially rely on the fact that the number of processes in each execution is finite. It would be interesting to consider the case with potentially infinitely many processes.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended

#### **Controlling a random population***-*

Thomas Colcombet1, Nathana¨el Fijalkow2,3(-), and Pierre Ohlmann<sup>1</sup>

> <sup>1</sup> Universit´e de Paris, IRIF, CNRS, Paris, France {thomas.colcombet,pierre.ohlmann}@irif.fr <sup>2</sup> CNRS, LaBRI, Bordeaux, France nathanael.fijalkow@labri.fr

<sup>3</sup> The Alan Turing Institute of data science, London, United Kingdom

**Abstract.** Bertrand et al. introduced a model of parameterised systems, where each agent is represented by a finite state system, and studied the following control problem: for any number of agents, does there exist a controller able to bring all agents to a target state? They showed that the problem is decidable and **EXPTIME**-complete in the adversarial setting, and posed as an open problem the stochastic setting, where the agent is represented by a Markov decision process. In this paper, we show that the stochastic control problem is decidable. Our solution makes significant uses of well quasi orders, of the max-flow min-cut theorem, and of the theory of regular cost functions.

## **1 Introduction**

*The control problem for populations of identical agents.* The model we study was introduced in [3] (see also the journal version [4]): a population of agents are controlled uniformly, meaning that the controller applies the same action to every agent. The agents are represented by a finite state system, the same for every agent. The key difficulty is that there is an arbitrary large number of agents: the control problem is whether for every <sup>n</sup> <sup>∈</sup> <sup>N</sup>, there exists a controller able to bring all n agents synchronously to a target state.

The technical contribution of [3,4] is to prove that in the adversarial setting where an opponent chooses the evolution of the agents, the (adversarial) control problem is **EXPTIME**-complete.

In this paper, we study the stochastic setting, where each agent evolves independently according to a probabilistic distribution, *i.e.* the finite state system modelling an agent is a Markov decision process. The control problem becomes whether for every <sup>n</sup> <sup>∈</sup> <sup>N</sup>, there exists a controller able to bring all <sup>n</sup> agents synchronously to a target state with probability one.

<sup>-</sup> The authors are committed to making professional choices acknowledging the climate emergency. We submitted this work to FoSSaCS for its excellence and because its location induces for us a low carbon footprint. This work was supported by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No.670624), and by the DeLTA ANR project (ANR-16-CE40-0007).

Our main technical result is that the stochastic control problem is decidable. In the next paragraphs we discuss four motivations for studying this problem: control of biological systems, parameterised verification and control, distributed computing, and automata theory.

*Modelling biological systems.* The original motivation for studying this model was for controlling population of yeasts ([21]). In this application, the concentration of some molecule is monitored through fluorescence level. Controlling the frequency and duration of injections of a sorbitol solution influences the concentration of the target molecule, triggering different chemical reactions which can be modelled by a finite state system. The objective is to control the population to reach a predetermined fluorescence state. As discussed in the conclusions of [3,4], the stochastic semantics is more satisfactory than the adversarial one for representing the behaviours of the chemical reactions, so our decidability result is a step towards a better understanding of the modelling of biological systems as populations of arbitrarily many agents represented by finite state systems.

*From parameterised verification to parameterised control.* Parameterised verification was introduced in [12]: it is the verification of a system composed of an arbitrary number of identical components. The control problem we study here and introduced in [3,4] is the first step towards *parameterised control*: the goal is control a system composed of many identical components in order to ensure a given property. To the best of our knowledge, the contributions of [3,4] are the first results on parameterised control; by extension, we present the first results on parameterised control in a stochastic setting.

*Distributed computing.* Our model resembles two models introduced for the study of distributed computing. The first and most widely studied is population protocols, introduced in [2]: the agents are modelled by finite state systems and interact by pairs drawn at random. The mode of interaction is the key difference with the model we study here: in a time step, all of our agents perform simultaneously and independently the same action. This brings us closer to broadcast protocols as studied for instance in [8], in which one action involves an arbitrary number of agents. As explained in [3,4], our model can be seen as a subclass of (stochastic) broadcast protocols, but key differences exist in the semantics, making the two bodies of work technically independent.

The focus of the distributed computing community when studying population or broadcast protocols is to construct the most efficient protocols for a given task, such as (prominently) electing a leader. A growing literature from the verification community focusses on checking the correctness of a given protocol against a given specification; we refer to the recent survey [7] for an overview. We concentrate on the control problem, which can then be seen as a first result in the control of distributed systems in a stochastic setting.

*Alternative semantics for probabilistic automata.* It is very tempting to consider the limit case of infinitely many agents: the parameterised control question becomes the value 1 problem for probabilistic automata, which was proved undecidable in [13], and even in very restricted cases ([10]). Hence abstracting continuous distributions by a discrete population of arbitrary size can be seen as an approximation technique for probabilistic automata. Using n agents correponds to using numerical approximation up to 2−<sup>n</sup> with random rounding; in this sense the control problem considers arbitrarily fine approximations. The plague of undecidability results on probabilistic automata (see *e.g.* [9]) is nicely contrasted by our positive result, which is one of the few decidability results on probabilistic automata not making structural assumptions on the underlying graph.

*Our results.* We prove decidability of the stochastic control problem. The first insight is given by the theory of well quasi orders, which motivates the introduction of a new problem called the sequential flow problem. The first step of our solution is to reduce the stochastic control problem to (many instances of) the sequential flow problem. The second insight comes from the theory of regular cost functions, providing us with a set of tools for addressing the key difficulty of the problem, namely the fact that there are arbitarily many agents. Our key technical contribution is to show the computability of the sequential flow problem by reducing it to a boundedness question expressed in the cost monadic second order logic using the max-flow min-cut theorem.

*Related work.* The notion of decisive Markov chains was introduced in [1] as a unifying property for studying infinite-state Markov chains with finite-like properties. A typical example of decisive Markov chains is lossy channel systems where tokens can be lost anytime inducing monotonicity properties. Our situation is the exact opposite as we are considering (using the Petri nets terminology) safe Petri nets where the number of tokens along a run is constant. So it is not clear whether the underlying argument in both cases can be unified using decisiveness.

*Organisation of the paper.* We define the stochastic control problem in Section 2, and the sequential flow problem in Section 3. We construct a reduction from the former to (many instances of) the latter in Section 4, and show the decidability of the sequential flow problem in Section 5.

## **2 The stochastic control problem**

**Definition 1.** *<sup>A</sup>* Markov decision process *(*MDP *for short) consists of*


The interpretation of the transition table is that from the state p under action a, the probability to transition to q is ρ(p, a)(q). The *transition relation* Δ is defined by

$$\Delta = \left\{(p, a, q) \in \mathcal{Q} \times \mathcal{A} \times \mathcal{Q} : \rho(p, a)(q) > 0\right\}.$$

We also use Δ<sup>a</sup> given by {(p, q) ∈ Q × Q : (p, a, q) ∈ Δ}.

We refer to [17] for the usual notions related to MDPs; it turns out that very little probability theory will be needed in this paper, so we restrict ourselves to mentioning only the relevant objects. In an MDP M, a strategy is a function σ : Q→A; note that we consider only pure and positional strategies, as they will be sufficient for our purposes.

Given a *source* s ∈ Q and a *target* t ∈ Q, we say that the strategy σ *almost surely* reaches t if the probability that a path starting from s and consistent with σ eventually leads to t is 1. As we shall recall in Section 4, whether there exists a strategy ensuring to reach t almost surely from s, called the *almost sure reachability problem* for MDP can be reduced to solving a two player B¨uchi game, and in particular does not depend upon the exact probabilities. In other words, the only relevant information for each (p, a, q) ∈Q×A×Q is whether ρ(p, a)(q) > 0 or not. Since the same will be true for the stochastic control problem we study in this paper, in our examples we do not specify the exact probabilities, and an edge from p to q labelled a means that ρ(p, a)(q) > 0.

Let us now fix an MDP M and consider a population of n *tokens* (we use tokens to represent the agents). Each token evolves in an independent copy of the MDP <sup>M</sup>. The controller acts through a *strategy* <sup>σ</sup> : <sup>Q</sup><sup>n</sup> → A, meaning that given the state each of the n tokens is in, the controller chooses *one* action to be performed by all tokens independently. Formally, we are considering the product MDP <sup>M</sup><sup>n</sup> whose set of states is <sup>Q</sup><sup>n</sup>, set of actions is <sup>A</sup>, and transition table is ρ<sup>n</sup>(u, a)(v) = n <sup>i</sup>=1 <sup>ρ</sup>(ui, a)(vi), where u, v ∈ Q<sup>n</sup> and <sup>u</sup>i, v<sup>i</sup> are the <sup>i</sup> th components of u and v.

Let s, t ∈ Q be the source and target states, we write <sup>s</sup><sup>n</sup> and <sup>t</sup> <sup>n</sup> for the constant n-tuples where all components are s and t. For a fixed value of n, whether there exists a strategy ensuring to reach t <sup>n</sup> almost surely from s<sup>n</sup> can be reduced to solving a two player B¨uchi game in the same way as above for a single MDP, replacing <sup>M</sup> by <sup>M</sup><sup>n</sup>. The stochastic control problem asks whether this is true for arbitrary values of n:

*Problem 1 (*Stochastic control problem*).* The inputs are an MDP M, a source state <sup>s</sup> ∈ Q and a target state <sup>t</sup> ∈ Q. The question is whether for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>, there exists a strategy ensuring to reach t <sup>n</sup> almost surely from s<sup>n</sup>.

Our main result is the following.

## **Theorem 1.** *The stochastic control problem is decidable.*

The fact that the problem is co-recursively enumerable is easy to see: if the answer is "no", there exists <sup>n</sup> <sup>∈</sup> <sup>N</sup> such that there exist no strategy ensuring to reach t <sup>n</sup> almost surely from s<sup>n</sup>. Enumerating the values of n and solving the almost sure reachability problem for <sup>M</sup><sup>n</sup> eventually finds this out. However, it is not clear whether one can place an upper bound on such a witness n, which would yield a simple (yet inefficient!) algorithm. As a corollary of our analysis we can indeed derive such an upper bound, but it is non elementary in the size of the MDP.

In the remainder of this section we present a few interesting examples.

**Example 1** Let us consider the MDP represented in Figure 1. We show that for this MDP, for any <sup>n</sup> <sup>∈</sup> <sup>N</sup>, the controller has an almost sure strategy to reach t <sup>n</sup> from s<sup>n</sup>. Starting with n tokens on s, we iterate the following strategy:


The first step is eventually successful with probability one, since at each iteration there is a positive probability that the number of tokens in state q increases. In the second step, with non zero probability at least one token goes to t, while the rest go back to s. It follows that each iteration of this strategy increases with non zero probability the number of tokens in t. Hence, all tokens are eventually transferred to t <sup>n</sup> almost surely.

**Fig. 1.** The controller can almost surely reach <sup>t</sup> <sup>n</sup> from <sup>s</sup><sup>n</sup>, for any <sup>n</sup> <sup>∈</sup> <sup>N</sup>.

**Example 2** We now consider the MDP represented in Figure 2. By convention, if from a state some action does not have any outgoing transition (for instance the action u from s), then it goes to the sink state ⊥.

We show that there exists a controller ensuring to transfer seven tokens from s to t, but that the same does not hold for eight tokens. For the first assertion, we present the following strategy:


**–** Play <sup>i</sup><sup>3</sup> ∈ {u, d}. The remaining token (if any) goes to <sup>t</sup>.

Now assume that there are 8 tokens or more on s. The only choices for a strategy are to play u or d on the second, fourth, and sixth move. First, with non zero probability at least 4 tokens are in each of q<sup>i</sup> <sup>1</sup> for i ∈ {u, d}. Then, whatever the choice of action i ∈ {u, d}, there are at least 4 tokens in q<sup>1</sup> after the next step. Proceeding likewise, there are at least 2 tokens in q<sup>2</sup> with non zero probability two steps later. Then again two steps later, at least 1 token falls in the sink with non zero probability.

**Fig. 2.** The controller can synchronise up to 7 tokens on the target state <sup>t</sup> almost surely, but not more.

Generalising this example shows that if the answer to the stochastic control problem is "no", the smallest number of tokens n for which there exist no almost surely strategy for reaching t <sup>n</sup> from <sup>s</sup><sup>n</sup> may be exponential in |Q|. This can further extended to show a doubly exponential in Q lower bound, as done in [3,4]; the example produced there holds for both the adversarial and the stochastic setting. Interestingly, for the adversarial setting this doubly exponential lower bound is tight. Our proof for the stochastic setting yields a non-elementary bound, leaving a very large gap.

**Example 3** We consider the MDP represented in Figure 3. For any <sup>n</sup> <sup>∈</sup> <sup>N</sup>, there exists a strategy almost surely reaching t <sup>n</sup> from s<sup>n</sup>. However, this strategy has to pass tokens one by one through q1. We iterate the following strategy:


Note that the first step may take a very long time (the expectation of the number of as to be played until this happens is exponential in the number of tokens), but it is eventually successful with probability one. This very slow strategy is necessary: if q<sup>1</sup> contains at least two tokens, then action b should not be played: with non zero probability, at least one token ends up in each of ql, qr, so at the next step some token ends up in ⊥. It follows that any strategy almost surely reaching t <sup>n</sup> has to be able to detect the presence of at most 1 token in q1. This is a key example for understanding the difficulty of the stochastic control problem.

**Fig. 3.** The controller can synchronise any number of tokens almost surely on the target state t, but they have to go one by one.

## **3 The sequential flow problem**

We let <sup>Q</sup> be a finite set of states. We call *configuration* an element of <sup>N</sup><sup>Q</sup> and *flow* an element of <sup>f</sup> <sup>∈</sup> <sup>N</sup>Q×Q. A flow <sup>f</sup> induces two configurations pre(f) and post(f) defined by

$$\text{pre}(f)(p) = \sum\_{q \in \mathcal{Q}} f(p, q) \qquad \text{and} \qquad \text{post}(f)(q) = \sum\_{p \in \mathcal{Q}} f(p, q).$$

Given c, c two configurations and f a flow, we say that c *goes to* c using f and write <sup>c</sup> <sup>→</sup><sup>f</sup> <sup>c</sup> , if c = pre(f) and c = post(f).

A *flow word* is f = f<sup>1</sup> ...f where each f<sup>i</sup> is a flow. We write c <sup>f</sup> c if there exists a sequence of configurations c = c0, c1,...,c- <sup>=</sup> <sup>c</sup> such that <sup>c</sup><sup>i</sup>−<sup>1</sup> <sup>→</sup> <sup>f</sup><sup>i</sup> <sup>c</sup><sup>i</sup> for all i ∈ {1,...,}. In this case, we say that c goes to c using the flow word f.

We now recall some classical definitions related to well quasi orders ([15,16], see [19] for an exposition of recent results). Let (E, ) be a quasi ordered set (*i.e.* is reflexive and transitive), it is a *well quasi ordered set* (WQO) if any infinite sequence contains an increasing pair. We say that S ⊆ E is *downward closed* if for any <sup>x</sup> <sup>∈</sup> <sup>S</sup>, if <sup>y</sup> <sup>x</sup> then <sup>y</sup> <sup>∈</sup> <sup>S</sup>. An *ideal* is a non-empty downward closed set I ⊆ E such that for all x, y ∈ I, there exists some z ∈ I satisfying both x z and y z.

## **Lemma 1.**


We equip the set of configurations N<sup>Q</sup> and the set of flows NQ×Q with the quasi order defined component wise, yielding thanks to Dickson's Lemma [6] two WQOs.

**Lemma 2.** *Let* <sup>X</sup> *be a finite set. A subset of* <sup>N</sup><sup>X</sup> *is an ideal if and only if it is of the form*

$$a \downarrow = \{ c \in \mathbb{N}^X \mid c \lessapprox a \},$$

*for some* <sup>a</sup> <sup>∈</sup> (<sup>N</sup> ∪ {ω})<sup>X</sup> *(in which* <sup>ω</sup> *is larger than all integers).*

We represent downward closed sets of configurations and flows using their decomposition into finitely many ideals of the form <sup>a</sup> <sup>↓</sup> for <sup>a</sup> <sup>∈</sup> (<sup>N</sup> ∪ {ω})<sup>Q</sup> or <sup>a</sup> <sup>∈</sup> (<sup>N</sup> ∪ {ω})Q×Q.

*Problem 2 (*Sequential flow problem*).* Let Q be a finite set of states. Given a downward closed set of flows *Flows* <sup>⊆</sup> <sup>N</sup>Q×Q and a downward closed set of final configurations <sup>F</sup> <sup>⊆</sup> <sup>N</sup>Q, compute the downward closed set

$$\operatorname{Pre}^\*(\operatorname{Flow}\_\*\boldsymbol{S}, \boldsymbol{F}) = \{ c \in \mathbb{N}^{\mathcal{Q}} \mid c \leadsto^f c' \in \boldsymbol{F}, \ f \in \operatorname{Flow} s^\* \}\ , \ i$$

i.e. the configurations from which one may reach F using only flows from *Flows*.

## **4 Reduction of the stochastic control problem to the sequential flow problem**

Let us consider an MDP M and a target t ∈ Q. We first recall a folklore result reducing the almost sure reachability question for MDPs to solving a two player B¨uchi game (we refer to [14] for the definitions and notations of B¨uchi games). The B¨uchi game is played between *Eve* and *Adam* as follows. From a state p:


The B¨uchi objective is satisfied (meaning Eve wins) if either the target state t is reached or Adam interrupts infinitely many times.

**Lemma 3.** *There exists a strategy ensuring almost surely to reach* <sup>t</sup> *from* <sup>s</sup> *if and only if Eve has a winning strategy from* s *in the above B¨uchi game.*

We now explain how this reduction can be extended to the stochastic control problem. Let us consider an MDP M and a target t ∈ Q. We now define an infinite B¨uchi game <sup>G</sup>M. The set of vertices is the set of configurations <sup>N</sup>Q. For a flow f, we write supp(f) = (p, q) ∈ Q<sup>2</sup> : <sup>f</sup>(p, q) <sup>&</sup>gt; <sup>0</sup> . The game is played as follows from a configuration c:


*agree* and the game continues from <sup>c</sup> = post(f)

*interrupt* and choose a flow <sup>f</sup> such that pre(f ) = c and supp(f ) ⊆ Δa, and the game continues from c = post(f ).

Note that Eve choosing a flow f is equivalent to choosing for each token a transition (p, q) ∈ Δa, inducing the configuration c , and simiarly for Adam should he decide to interrupt.

Eve wins if either all tokens are in the target state, or if Adam interrupts infinitely many times.

Note that although the game is infinite, it is actually a disjoint union of finite games. Indeed, along a play the number of tokens is fixed, so each play is included in <sup>Q</sup><sup>n</sup> for some <sup>n</sup> <sup>∈</sup> <sup>N</sup>.

**Lemma 4.** *Let* <sup>c</sup> *be a configuration with* <sup>n</sup> *tokens in total, the following are equivalent:*


Lemma 4 follows from applying Lemma 3 on the product MDP <sup>M</sup><sup>n</sup>.

We also consider the game <sup>G</sup>(i) <sup>M</sup> for <sup>i</sup> <sup>∈</sup> <sup>N</sup>, which is defined just as <sup>G</sup><sup>M</sup> except for the winning objective: Eve wins in <sup>G</sup>(i) <sup>M</sup> if either all tokens are in the target state, or if Adam interrupts more than i times. It is clear that if Eve has a winning strategy in <sup>G</sup><sup>M</sup> then she has a winning strategy in <sup>G</sup>(i) <sup>M</sup>. Conversely, the following result states that <sup>G</sup>(i) <sup>M</sup> is equivalent to <sup>G</sup><sup>M</sup> for some <sup>i</sup>.

**Lemma 5.** *There exists* <sup>i</sup> <sup>∈</sup> <sup>N</sup> *such that from any configuration* <sup>c</sup> <sup>∈</sup> <sup>N</sup>Q*, Eve has a winning strategy in* <sup>G</sup><sup>M</sup> *if and only if Eve has a winning strategy in* <sup>G</sup>(i) M*.*

*Proof:* Let <sup>X</sup>(i) <sup>⊆</sup> <sup>N</sup><sup>Q</sup> be the winning region for Eve in <sup>G</sup>(i) <sup>M</sup>. We first argue that X = <sup>i</sup> <sup>X</sup>(i) is the winning region in <sup>G</sup>M. It is clear that <sup>X</sup> is contained in the winning region: if Eve has a strategy to ensure that either all tokens are in the target state, or that Adam interrupts infinitely many times, then it particular this is true for Adam interrupting more than i times for any i. The converse inclusion holds because G<sup>M</sup> is a disjoint union of finite B¨uchi games. Indeed, in a finite B¨uchi game, since Adam can restrict himself to playing a memoryless winning strategy, if Eve can ensure that he interrupts a certain number of times (larger than the size of the game), then by a simple pumping argument this implies that Adam will interrupt infinitely many times.

To conclude, we note that each X(i) is downward closed: indeed, a winning strategy from a configuration c can be used from a configuration c where there are fewer tokens in each state. It follows that (X(i))<sup>i</sup>≥<sup>0</sup> is a decreasing sequence of downward closed sets in NQ, hence it stabilises thanks to Lemma 1, *i.e.* there exists <sup>i</sup><sup>0</sup> <sup>∈</sup> <sup>N</sup> such that <sup>X</sup>(i0) <sup>=</sup> <sup>i</sup> <sup>X</sup>(i), which concludes.

Note that Lemma 4 and Lemma 5 substantiate the claims made in Section 2: pure positional strategies are enough and the answer to the stochastic control problem does not depend upon the exact probabilities in the MDP. Indeed, the construction of the B¨uchi games do not depend on them, and the answer to the former is equivalent to determining whether Eve has a winning strategy in each of them.

We are now fully equipped to show that a solution to the sequential flow problem yields the decidability of the stochastic control problem.

Let F be the set of configurations for which all tokens are in state t. we let <sup>X</sup>(i) <sup>⊆</sup> <sup>N</sup><sup>Q</sup> denote the winning region for Eve in the game <sup>G</sup>(i) <sup>M</sup>. Note first that X(0) = Pre∗(*Flows*<sup>0</sup>, F) where

$$Flows^0 = \{ f \in \mathbb{N}^{\mathcal{Q} \times \mathcal{Q}} : \exists a \in \mathcal{A}, \text{ supp}(f) \subseteq \Delta\_a \}.$$

Indeed, in the game <sup>G</sup>(0) <sup>M</sup> Adam cannot interrupt as this would make him lose immediately. Hence, the winning region for Eve in <sup>G</sup>(0) <sup>M</sup> is Pre∗(*Flows*<sup>0</sup>, F).

We generalise this by setting *Flows*<sup>i</sup> for all i > 0 to be the set of flows <sup>f</sup> <sup>∈</sup> <sup>N</sup>Q×Q such that for some action <sup>a</sup> ∈ A,

$$-\underset{\text{for }f'}{\text{supp}(f)} \subseteq \Delta\_a, \text{and}$$

$$- \text{ for } f' \text{ with } \text{pre}(f') = \text{pre}(f) \text{ and } \text{supp}(f') \subseteq \Delta\_a \text{, we have } \text{post}(f') \in X^{(i-1)}.$$

Equivalently, this is the set of flows for which, when played in the game G<sup>M</sup> by Eve, Adam cannot use an interrupt move and force the configuration outside of X(i−1).

We now claim that

$$X^{(i)} = \text{Pre}^\*(Flow^i, F)$$

for all i ≥ 0.

We note that this means that for each i computing X(i) reduces to solving one instance of the sequential flow problem. This induces an algorithm for solving the stochastic control problem: compute the sequence (X(i))i≥<sup>0</sup> until it stabilises, which is ensured by Lemma 5 and yields the winning region of GM. The answer to the stochastic control problem is then whether the initial configuration where all tokens are in s belongs to the winning region of GM.

Let us prove the claim by induction on i.

Let c be a configuration in Pre∗(*Flows*<sup>i</sup> , F). This means that there exists a flow word f = f<sup>1</sup> ··· f such that <sup>f</sup><sup>k</sup> <sup>∈</sup> *Flows*<sup>i</sup> for all <sup>k</sup>, and <sup>c</sup> <sup>f</sup> <sup>c</sup> <sup>∈</sup> <sup>F</sup>. Expanding the definition, there exist c<sup>0</sup> = c, . . . , c- <sup>=</sup> <sup>c</sup> such that <sup>c</sup><sup>k</sup>−<sup>1</sup> <sup>→</sup> <sup>f</sup><sup>k</sup> <sup>c</sup><sup>k</sup> for all k.

Let us now describe a strategy for Eve in <sup>G</sup>(i) <sup>M</sup> starting from <sup>c</sup>. As long as Adam agrees, Eve successively chooses the sequence of flows f1, f2,... and the corresponding configurations c1, c2,... . If Adam never interrupts, then the game reaches the configuration c ∈ F, and Eve wins. Otherwise, as soon as Adam interrupts, by definition of *Flows*<sup>i</sup> , we reach a configuration <sup>d</sup> <sup>∈</sup> <sup>X</sup>(i−1). By induction hypothesis, Eve has a strategy which ensures from d to either reach F or that Adam interrupts at least i − 1 times. In the latter case, adding the interrupt move leading to d yields i interrupts, so this is a winning strategy for Eve in <sup>G</sup>(i) <sup>M</sup>, witnessing that <sup>c</sup> <sup>∈</sup> <sup>X</sup>(i).

Conversely, assume that there is a winning strategy <sup>σ</sup> of Eve in <sup>G</sup>(i) <sup>M</sup> from a configuration c. Consider a play consistent with σ, it either reaches F or Adam interrupts. Let us denote by f = f1, f2,...,f the sequence of flows until then. We argue that <sup>f</sup><sup>k</sup> <sup>∈</sup> *Flows*<sup>i</sup> for <sup>k</sup> ∈ {1,...,}. Let <sup>f</sup> <sup>=</sup> <sup>f</sup><sup>k</sup> for some <sup>k</sup>, by definition of the game supp(f) ⊆ Δ<sup>a</sup> for some action a. Let f such that pre(f ) = pre(f) and supp(f ) ⊆ Δa. In the game G<sup>M</sup> after Eve played fk, Adam has the possibility to interrupt and choose f . From this configuration onward the strategy <sup>σ</sup> is winning in <sup>G</sup>(i−1) <sup>M</sup> , implying that <sup>f</sup> <sup>∈</sup> *Flows*<sup>i</sup> . Thus f = f1f<sup>2</sup> ...f- is a witness that <sup>c</sup> <sup>∈</sup> <sup>X</sup>(i).

## **5 Computability of the sequential flow problem**

Let <sup>Q</sup> be a finite set of states, *Flows* <sup>⊆</sup> <sup>N</sup>Q×Q a downward closed set of flows and <sup>F</sup> <sup>⊆</sup> <sup>N</sup><sup>Q</sup> a downward closed set of configurations, the sequential flow problem is to compute the downward closed set Pre<sup>∗</sup> defined by

$$\operatorname{Pre}^\*(Flows, F) = \{ c \in \mathbb{N}^{\mathcal{Q}} \mid c \leadsto^f c' \in F, \ f \in Flows^\* \} \text{ },$$

*i.e.* the configurations from which one may reach F using only flows from *Flows*.

The following classical result of [22] allows us to further reduce our problem.

**Lemma 6.** *The task of computing a downward closed set can be reduced to the task of deciding whether a given ideal is included in a downward closed set.*

Thanks to Lemma 6, it is sufficient for solving the sequential flow problem to establish the following result.

**Lemma 7.** *Let* <sup>I</sup> *be an ideal of the form* <sup>a</sup><sup>↓</sup> *for* <sup>a</sup> <sup>∈</sup> (<sup>N</sup> ∪ {ω})Q*, and Flows* <sup>⊆</sup> NQ×Q *be a downward closed set of flows. It is decidable whether* F *can be reached from all configurations of* I *using only flows from Flows .*

We call a vector <sup>a</sup> <sup>∈</sup> (<sup>N</sup> ∪ {ω})Q×Q <sup>a</sup> *capacity*. A *capacity word* is a finite sequence of capacities. For two capacity words w, w of the same length, we write w ≤ w to mean that w<sup>i</sup> ≤ w <sup>i</sup> for each i. Since flows are particular cases of capacities, we can compare flows with capacities in the same way.

Before proving Lemma 7 let us give an example and some notations.

Given a state <sup>q</sup>, we write <sup>q</sup> <sup>∈</sup> <sup>N</sup><sup>Q</sup> for the vector which has value 1 on the <sup>q</sup> component and 0 elsewhere. More generally we let αq for <sup>α</sup> <sup>∈</sup> <sup>N</sup> ∪ {ω} denote the vector with value α on the q component and 0 elsewhere. We use similar notations for flows. For instance, ωq<sup>1</sup> + q<sup>2</sup> has value ω in the q<sup>1</sup> component, 1 in the q<sup>2</sup> component, and 0 elsewhere.

In the instance of the sequential flow problem represented in Figure 4, we ask the following question: can F be reached from any configuration of I = (ωq2)↓? The answer is yes: the capacity word w = (ac<sup>n</sup>−<sup>1</sup>b)<sup>n</sup> is such that nq2 <sup>f</sup> nq<sup>4</sup> <sup>∈</sup> <sup>F</sup> for a flow word f w, the begining of which is described in Figure 5.

**Fig. 4.** An instance of the sequential flow problem. We let *Flows* <sup>=</sup> <sup>a</sup> ↓ ∪ <sup>b</sup> ↓ ∪ <sup>c</sup> <sup>↓</sup> where a = ω(q2, q2)+(q2, q3) + ω(q4, q4), b = ω(q1, q2)+(q3, q4) + ω(q4, q4), and c = ω(q1, q1)+(q2, q1) + ω(q2, q2) + ω(q3, q3) + ω(q4, q4). Set also F = (ωq4)↓.

**Fig. 5.** A flow word <sup>f</sup> <sup>=</sup> <sup>f</sup>1f<sup>2</sup> ...fn+1 ac<sup>n</sup>−<sup>1</sup><sup>b</sup> such that nq<sup>2</sup> goes to (<sup>n</sup> <sup>−</sup> 1)q<sup>1</sup> <sup>+</sup> <sup>q</sup><sup>4</sup> using f. This construction can be extended to f w such that nq<sup>2</sup> goes to nq<sup>4</sup> using f.

We write a[ω ← n] for the configuration obtained from a by replacing all ωs by n.

The key idea for solving the sequential flow problem is to rephrase it using *regular cost functions* (a set of tools for solving boundedness questions). Indeed, whether F can be reached from all configurations of I = a ↓ using only flows from *Flows* can be equivalently phrased as a boundedness question, as follows:

does there exist a bound on the values of <sup>n</sup> <sup>∈</sup> <sup>N</sup> such that <sup>a</sup>[<sup>ω</sup> <sup>←</sup> <sup>n</sup>] <sup>f</sup> c for some c ∈ F and f ∈ *Flows*∗?

We show that this boundedness question can be formulated as a boundedness question for a formula of *cost monadic logic*, a formalism that we introduce now. We assume that the reader is familiar with *monadic second order logic* (MSO) over finite words, and refer to [20] for the definitions. The syntax of cost monadic logic (cost MSO for short) extends MSO with the construct |X| ≤ N, where X is a second order variable and N is a bounding variable. The semantics is defined as usual: w, n <sup>|</sup><sup>=</sup> <sup>ϕ</sup> for a word <sup>w</sup> <sup>∈</sup> <sup>A</sup>∗, with <sup>n</sup> <sup>∈</sup> <sup>N</sup> specifying the bound <sup>N</sup>. We assume that there is at most one bounding variable, and that the construct |X| ≤ N appears positively, *i.e.* under an even number of negations. This ensures that the larger N, the more true the formula is: if w, n |= ϕ, then w, n |= ϕ for all n ≥ n. The semantics of a formula ϕ of cost MSO induces a function <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>N</sup> ∪ {∞} defined by <sup>ϕ</sup>(w) = inf {<sup>n</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> w, n <sup>|</sup><sup>=</sup> <sup>ϕ</sup>}.

The *boundedness problem* for cost monadic logic is the following problem: given a cost MSO formula <sup>ϕ</sup> over <sup>A</sup>∗, is it true that the function <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>N</sup> ∪ {∞} is bounded, *i.e.*:

$$\exists n \in \mathbb{N}, \,\forall w \in A^\*, \,\, w, n \mid = \varphi?$$

The decidability of the boundedness problem is a central result in the theory of regular cost functions ([5]). Since in the theory of regular cost functions, when considering functions we are only interested in whether they are bounded or not, we will consider functions "up to boundedness properties". Concretely, this means that a *cost function* is an equivalence class of functions <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>N</sup> ∪ {∞}, with the equivalence being <sup>f</sup> <sup>≈</sup> <sup>g</sup> if there exists <sup>α</sup> : <sup>N</sup> <sup>→</sup> <sup>N</sup> such that <sup>f</sup>(w) is finite if and only if g(w) is finite, and in this case, f(w) α(g(w)) and g(w) α(f(w)). This is equivalent to stating that for all X ⊆ A∗, if f is bounded over X if and only if g is bounded over X.

Let us now establish Lemma 7.

*Proof:* Let T = {q ∈Q| a(q) = ω}. Note that for n sufficiently large, we have <sup>a</sup>[<sup>ω</sup> <sup>←</sup> <sup>n</sup>]↓<sup>=</sup> <sup>I</sup> ∩ {0, <sup>1</sup>,...,n}. We let <sup>C</sup> <sup>⊆</sup> (<sup>N</sup> ∪ {ω})Q×Q be the decomposition of *Flows* into ideals, that is, C is the minimal finite set such that

$$Flows = \bigcup\_{b \in \mathcal{C}} b \downarrow \downarrow$$

We let k denote the largest finite value that appears in the definition of C , that is, k = max{b(q, q ) : b ∈ C , q, q ∈ Q, b(q, q ) = ω}.

Let us define the function

$$\begin{array}{c} \Phi: \mathcal{C}^\* \longrightarrow \mathbb{N} \cup \{\omega\} \\ w \longmapsto \sup \{ n \in \mathbb{N} : \exists f \leqslant w, a[\omega \gets n] \leadsto^f F \}. \end{array}$$

By definition Φ is unbounded if and only if F can be reached from all configurations of I. Since boundedness of cost MSO is decidable, it suffices to construct a formula in cost monadic logic for Φ to obtain the decidability of our problem. Our approach will be to additively decompose the capacity word w into a finitary part w(fin) (which is handled using a regular language), and several unbounded parts <sup>w</sup>(s) for each <sup>s</sup> <sup>∈</sup> <sup>T</sup>. The unbounded parts require a more careful analysis which notably goes through the use of the max-flow min-cut theorem.

Note that a[ω ← n] decomposes as the sum of its finite part afin = a[ω ← 0] and <sup>s</sup>∈<sup>T</sup> ns. Since flows are additive, it holds that <sup>f</sup> <sup>w</sup> <sup>=</sup> <sup>w</sup><sup>1</sup> ...w<sup>l</sup> is a flow from c<sup>n</sup> to F if and only if the capacity word w may be decomposed into (w(s))<sup>s</sup>∈<sup>T</sup> = (w(s) <sup>1</sup> ...w(s) <sup>l</sup> )<sup>s</sup>∈<sup>T</sup> and <sup>w</sup>(fin) <sup>=</sup> <sup>w</sup>(fin) <sup>1</sup> ...w(fin) <sup>l</sup> such that


In order to encode such capacity words in cost MSO we use monadic variables W(s) q,q-,p where q, q ∈ Q, p ∈ {0,..., k, ω} and s ∈ T ∪ {fin}. They are meant to satisfy that <sup>i</sup> <sup>∈</sup> <sup>W</sup>(s) q,q-,p,s if and only if <sup>w</sup>(s) <sup>i</sup> (q, q ) = p. We use bold *W* to denote the tuple (W(s) q,q-,p)q,q-,p,s, and *W*(s) for (W(s) q,q-,p)q,q-,p when s ∈ T ∪ {ω} is fixed. The MSO formula IsDecomp(*W*, w) states that a decomposition (w(s) )<sup>s</sup>∈<sup>T</sup> ∪{ω} is semantically valid and sums to w:

$$\begin{aligned} \forall i, \quad \left[\bigwedge\_{q,q',s} \bigvee\_{p \in \{0,\dots,k,\omega\}} \left(i \in W^{(s)}\_{q,q',p} \wedge \bigwedge\_{p' \neq p} i \notin W^{(s)}\_{q,q',p}\right)\right] \\ \land \left[\left(\bigwedge\_{q,q'p} w\_i(q,q') = p\right) \implies \bigvee\_{\substack{\{p\_s\}\_{s \in T \cup \{\text{fin}\}} \\ \sum p\_s = p}} \bigwedge\_{s \in T \cup \{\text{fin}\}} i \in W^{(s)}\_{q,q',p\_s}\right] \end{aligned}$$

For s ∈ T, we now consider the function

$$\begin{aligned} \Psi^{(s)}: \left( \{0, 1, \ldots, k, \omega\}^{\mathcal{Q}\times\mathcal{Q}} \right)^{\ast} &\longrightarrow \mathbb{N} \cup \{\omega\} \\ w^{(s)} &\longmapsto \sup \{ n \in \mathbb{N} \mid \exists f \leqslant w^{(s)}, \ n s \xhookrightarrow{f} F \} .\end{aligned}$$

We also define <sup>Ψ</sup>(fin) <sup>⊆</sup> ({0,..., k, ω}) Q×Q to be the language of capacity words w(fin) such that there exists a flow f w(fin) with afin <sup>f</sup> F. Note that Ψ(fin) is a regular language since it is recognized by a finite automaton over {0, 1,...,k|Q|}<sup>Q</sup> that may update the current bounded configuration only with flows smaller than the current letter of w(fin).

We have

$$\Phi(w) = \sup\_{n} \left[ \exists W, \mathsf{IsDescomp}(W, w) \land \left( \bigwedge\_{s \in T} \Psi^{(s)}(W^{(s)}) \ge n \right) \land W^{(\text{fin})} \in \Psi^{(\text{fin})} \right].$$

Hence, it is sufficient to prove that for each <sup>s</sup> <sup>∈</sup> <sup>T</sup>, <sup>Ψ</sup>(s) is definable in cost MSO.

Let us fix s and a capacity word w ∈ {0,..., k, ω}Q×Q of length |w| = . Consider the finite graph G with vertex set Q×{0, 1,...,} and for all i ≥ 1, an edge from (q, i − 1) to (q , i) labelled by wi(q, q ). Then Ψ(s) (w) is the maximal flow from (s, 0) to (t, ) in G. We recall that a *cut* in a graph with distinguished source s and target t is a set of edges such that removing them disconnects s and t. The *cost of a cut* is the sum of the weight of its edges. The *max-flow min-cut theorem* states that the maximal flow in a graph is exactly the minimal cost of a cut ([11]).

We now define a cost MSO formula Ψ˜(s) which is equivalent (in terms of cost functions) to the minimal cost of cut in the previous graph G and thus to Ψ(s) . In the following formula, *X* = (Xq,q- )q,q-∈Q represents a cut in the graph: i ∈ Xq,q- means that edge ((q, i−1),(q , i)) belongs to the cut. Likewise, *P* = (Pq,q- )q,q-∈Q represents paths in the graph. Let Ψ˜(s) (w) be defined by

$$\inf\_{n} \left\{ \exists X \left[ \bigwedge\_{q, q'} n \ge |X\_{q, q'}| \right] \land \left( \forall i, i \in X\_{q, q'} \implies w\_i(q, q') < \omega \right) \land \text{Disc}\_{s, t}(X, w) \right\},$$

where Discs,t(*X*, w) expresses that *X* disconnects (s, 0) and (t, ) in G. For instance Discs,t(*X*, w) is defined by

$$\forall P, \left[ \left( \forall i, \bigwedge\_{q, q'} i \in P\_{q, q'} \implies w\_i(q, q') > 0 \right) \wedge \left( \bigvee\_{q'} 0 \in P\_{s, q'} \right) \wedge \left( \bigvee\_{q} \ell \in P\_{q, t} \right) \wedge \iota \right]$$

$$\forall i \ge 1, \bigwedge\_{q, q'} i \in P\_{q, q'} \implies \left( \bigvee\_{q''} i - 1 \in P\_{q'', q} \right) \right] \implies \exists i, \bigvee\_{q, q'} \left( i \in X\_{q, q'} \wedge i \in P\_{q, q'} \right).$$

Now Ψ˜(s)(w) does not exactly define the minimal total weight Φ(s)(w) of a cut, but rather the minimal value over all cuts of the minimum over (q, q ) ∈ Q<sup>2</sup> of how many edges are of the form ((q, i − 1),(q , i)). This is good enough for our purposes since these two values are related by

$$
\tilde{\Psi}^{(s)}(w) \lesssim \Phi^{(s)}(w) \lesssim k|Q|^2 \tilde{\Psi}^{(s)}(w),
$$

implying that the functions Ψ˜(s) and Φ(s) define the same cost function. In particular, Φ(s) is definable in cost MSO.

## **6 Conclusions**

We showed the decidability of the stochastic control problem. Our approach uses well quasi orders and the sequential flow problem, which is then solved using the theory of regular cost functions.

Together with the original result of [3,4] in the adversarial setting, our result contributes to the theoretical foundations of parameterised control. We return to the first application of this model, control of biological systems. As we discussed the stochastic setting is perhaps more satisfactory than the adversarial one, although as we saw very complicated behaviours emerge in the stochastic setting involving single agents, which are arguably not pertinent for modelling biological systems.

We thus pose two open questions. The first is to settle the complexity status of the stochastic control problem. Very recently [18] proved the **EXPTIME**hardness of the problem, which is interesting because the underlying phenomena involved in this hardness result are specific to the stochastic setting (and do not apply to the adversarial setting). Our algorithm does not even yield elementary upper bounds, leaving a very large complexity gap. The second question is towards more accurately modelling biological systems: can we refine the stochastic control problem by taking into account the synchronising time of the controller, and restrict it to reasonable bounds?

## **Acknowledgements**

We thank Nathalie Bertrand and Blaise Genest for introducing us to this fascinating problem, and the preliminary discussions at the Simons Institute for the Theory of Computing in Fall 2015.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Decomposing Probabilistic Lambda-Calculi**

Ugo Dal Lago<sup>1</sup> , Giulio Guerrieri2(-) , and Willem Heijltjes<sup>2</sup>

> <sup>1</sup> Dipartimento di Informatica - Scienza e Ingegneria Universit`a di Bologna, Bologna, Italy ugo.dallago@unibo.it

<sup>2</sup> Department of Computer Science University of Bath, Bath, UK {w.b.heijltjes,g.guerrieri}@bath.ac.uk

**Abstract.** A notion of probabilistic lambda-calculus usually comes with a prescribed reduction strategy, typically call-by-name or call-by-value, as the calculus is non-confluent and these strategies yield different results. This is a break with one of the main advantages of lambda-calculus: confluence, which means that results are independent from the choice of strategy. We present a probabilistic lambda-calculus where the probabilistic operator is decomposed into two syntactic constructs: a generator, which represents a probabilistic event; and a consumer, which acts on the term depending on a given event. The resulting calculus, the Probabilistic Event Lambda-Calculus, is confluent, and interprets the callby-name and call-by-value strategies through different interpretations of the probabilistic operator into our generator and consumer constructs. We present two notions of reduction, one via fine-grained local rewrite steps, and one by generation and consumption of probabilistic events. Simple types for the calculus are essentially standard, and they convey strong normalization. We demonstrate how we can encode call-by-name and call-by-value probabilistic evaluation.

## **1 Introduction**

Probabilistic lambda-calculi [24,22,17,11,18,9,15] extend the standard lambdacalculus with a probabilistic choice operator N <sup>⊕</sup><sup>p</sup>M, which chooses <sup>N</sup> with probability p and M with probability 1 <sup>−</sup> p (throughout this paper, we let p be 1/<sup>2</sup> and will omit it). Duplication of N <sup>⊕</sup>M, as is wont to happen in lambdacalculus, raises a fundamental question about its semantics: do the duplicate occurrences represent the same probabilistic event, or different ones with the these represent a single coin flip that determines the outcome for both copies? Put differently again, when we duplicate -<sup>⊕</sup> ⊥, do we duplicate the event, or only its outcome? same probability? For example, take the term -<sup>⊕</sup> ⊥ that represents a coin flip between boolean values *true* and *false* ⊥. If we duplicate this term, do the copies represent two distinct coin flips with possibly distinct outcomes, or do

In probabilistic lambda-calculus, these two interpretations are captured by the evaluation strategies of call-by-name ( cbn), which duplicates events, and call-by-value ( cbv), which evaluates any probabilistic choice before it is duplicated, and thus only duplicates outcomes. Consider the following example, where = tests equality of boolean values.

$$\top \quad \mathsf{c} \mathsf{b} \mathsf{v} \mathsf{A} \mathsf{v} \quad (\lambda x.x = x) \\ (\top \oplus \bot) \quad \mathsf{\mathsf{v}\mathsf{A}} \mathsf{c} \mathsf{n} \quad \top \oplus \bot$$

This situation is not ideal, for several, related reasons. Firstly, it demonstrates how probabilistic lambda-calculus is non-confluent, negating one of the central properties of the lambda-calculus, and one of the main reasons why it is the prominent model of computation that it is. Secondly, it means that a probabilistic lambda-calculus must derive its semantics from a prescribed reduction strategy, and its terms only have meaning in the context of that strategy. Thirdly, combining different kinds of probabilities becomes highly involved [15], as it would require specialized reduction strategies. These issues present themselves even in a more general setting, namely that of commutative (algebraic) effects, which in general do not commute with copying.

We address these issues by a decomposition of the probabilistic operator into a generator <sup>a</sup> and a choice <sup>a</sup> <sup>⊕</sup>, as follows.

$$N^{\oplus}M \quad \stackrel{\Delta}{=} \quad \{\square\}.N^{\stackrel{a}{\oplus}}M$$

Semantically, <sup>a</sup> represents a probabilistic event, that generates a boolean value recorded as a. The choice N <sup>a</sup> <sup>⊕</sup>M is simply a conditional on a, choosing N if a is false and M if a is true. Syntactically, a is a boolean variable with an occurrence in <sup>a</sup> <sup>⊕</sup>, and <sup>a</sup> acts as a probabilistic quantifier, binding all occurrences in its scope. (To capture a non-equal chance, one would attach a probability p to a generator, as <sup>a</sup> <sup>p</sup>, though we will not do so in this paper.)

The resulting probabilistic event lambda-calculus ΛPE, which we present in this paper, is confluent. Our decomposition allows us to separate duplicating an event, represented by the generator <sup>a</sup> , from duplicating only its outcome a, through having multiple choice operators <sup>a</sup> <sup>⊕</sup>. In this way our calculus may interpret both original strategies, call-by-name and call-by-value, by different translations of standard probabilistic terms into ΛPE: call-by-name by the above decomposition (see also Section 2), and call-by-value by a different one (see Section 7). For our initial example, we get the following translations and reductions.

$$\mathsf{cchn}: \quad (\lambda x. \, x = x) \\
(\square \square \, \top \oplus \bot) \quad \rightsquigarrow\_{\beta} \quad (\boxed{a} . \, \top \, \overset{a}{\oplus} \bot) \\
= (\boxed{b} . \, \top \, \overset{b}{\oplus} \bot) \quad \rightsquigarrow \quad \top \oplus \bot \quad \text{(1)}$$

$$\mathsf{cbv}: \quad \boxed{a}. \,(\lambda x. \, x = x) \\
(\top \stackrel{a}{\oplus} \bot) \quad \multimap\_{\beta} \quad \boxed{a}. \,(\top \stackrel{a}{\oplus} \bot) \\
= (\top \stackrel{a}{\oplus} \bot) \qquad \multimap\_{\bullet} \quad \top \tag{2}$$

We present two reduction relations for our probabilistic constructs, both independent of beta-reduction. Our main focus will be on permutative reduction (Sections 2, 3), a small-step local rewrite relation which is computationally inefficient but gives a natural and very fine-grained operational semantics. Projective reduction (Section 6) is a more standard reduction, following the intuition that <sup>a</sup> generates a coin flip to evaluate <sup>a</sup> ⊕ , and is coarser but more efficient.

We further prove confluence (Section 4), and we give a system of simple types and prove strong normalization for typed terms by reducibility (Section 5). Omitted proofs can be found in [7], the long version of this paper.

#### **1.1 Related Work**

Probabilistic λ-calculi are a topic of study since the pioneering work by Saheb-Djaromi [24], the first to give the syntax and operational semantics of a λ-calculus with binary probabilistic choice. Giving well-behaved denotational models for probabilistic λ-calculi has proved to be challenging, as witnessed by the many contributions spanning the last thirty years: from Jones and Plotkin's early study of the probabilistic powerdomain [17], to Jung and Tix's remarkable (and mostly negative) observations [18], to the very recent encouraging results by Goubault-Larrecq [16]. A particularly well-behaved model for probabilistic λ-calculus can be obtained by taking a probabilistic variation of Girard's coherent spaces [10], this way getting full abstraction [13].

On the operational side, one could mention a study about the various ways the operational semantics of a calculus with binary probabilistic choice can be specified, namely by small-step or big-step semantics, or by inductively or coinductively defined sets of rules [9]. Termination and complexity analysis of higherorder probabilistic programs seen as λ-terms have been studied by way of type systems in a series of recent results about size [6], intersection [4], and refinement type disciplines [1]. Contextual equivalence on probabilistic λ-calculi has been studied, and compared with equational theories induced by B¨ohm Trees [19], applicative bisimilarity [8], or environmental bisimilarity [25].

In all the aforementioned works, probabilistic λ-calculi have been taken as implicitly endowed with either call-by-name or call-by-value strategies, for the reasons outlined above. There are only a few exceptions, namely some works on Geometry of Interaction [5], Probabilistic Coherent Spaces [14], and Standardization [15], which achieve, in different contexts, a certain degree of independence from the underlying strategy, thus accommodating both call-by-name and call-by-value evaluation. The way this is achieved, however, invariably relies on Linear Logic or related concepts. This is deeply different from what we do here.

Some words of comparison with Faggian and Ronchi Della Rocca's work on confluence and standardization [15] are also in order. The main difference between their approach and the one we pursue here is that the operator ! in their calculus Λ! <sup>⊕</sup> plays *both* the roles of a marker for duplicability and of a checkpoint for any probabilistic choice "flowing out" of the term (*i.e.* being fired). In our calculus, we do not control duplication, but we definitely make use of checkpoints. Saying it another way, Faggian and Ronchi Della Rocca's work is inspired by linear logic, while our approach is inspired by deep inference, even though this is, on purpose, not evident in the design of our calculus.

Probabilistic λ-calculi can also be seen as vehicles for expressing probabilistic models in the sense of bayesian programming [23,3]. This, however, requires an operator for modeling conditioning, which complicates the metatheory considerably, and that we do not consider here.

Our permutative reduction is a refinement of that for the call-by-name probabilistic λ-calculus [20], and is an implementation of the equational theory of (ordered) binary decision trees via rewriting [27]. Probabilistic decision trees have been proposed with a primitive binary probabilistic operator [22], but not with a decomposition as we explore here.

## **2 The Probabilistic Event** *λ***-Calculus** *Λ***PE**

**Definition 1.** The probabilistic event λ-calculus (ΛPE) is given by the following grammar, with from left to right: a variable (denoted by x, y, z, . . .), an abstraction, an application, a (labeled) choice, and a (probabilistic) generator.

$$M, N \quad \dots = \quad x \mid \quad \lambda x.N \quad | \quad NM \quad | \quad N^{\stackrel{a}{\oplus}}M \quad | \quad \overline{\Box}.N$$

In a term λx. M the abstraction λx binds the free occurrences of the variable x in its scope M, and in <sup>a</sup> . N the generator <sup>a</sup> binds the label <sup>a</sup> in <sup>N</sup>. The calculus features a decomposition of the usual probabilistic sum <sup>⊕</sup> , as follows.

$$N \oplus M \quad \stackrel{\Delta}{=} \quad \Box a . N \stackrel{a}{\oplus} M \tag{3}$$

The generator <sup>a</sup> represents a probabilistic event, whose outcome, a binary value {0, <sup>1</sup>} represented by the label a, is used by the choice operator <sup>a</sup> <sup>⊕</sup>. That is, <sup>a</sup> flips a coin setting a to 0 (resp. 1), and depending on this N <sup>a</sup> <sup>⊕</sup>M reduces to N (resp. M). We will use the unlabeled choice <sup>⊕</sup> as in (3). This convention also gives the translation from a call-by-name probabilistic <sup>λ</sup>-calculus into <sup>Λ</sup>PE (the interpretation of a call-by-value probabilistic λ-calculus is in Section 7).

**Reduction.** Reduction in <sup>Λ</sup>PE will consist of standard <sup>β</sup>-reduction <sup>β</sup> plus an evaluation mechanism for generators and choice operators, which implements probabilistic choice. We will present two such mechanisms: projective reduction <sup>π</sup> and permutative reduction <sup>p</sup>. While projective reduction implements the given intuition for the generator and choice operator, we relegate it to Section 6 and make permutative reduction our main evaluation mechanism, for the reason that it is more fine-grained, and thus more general.

Permutative reduction is based on the idea that any operator distributes over the labeled choice operator (see the reduction steps in Figure 1), even other choice operators, as below.

$$(N^{\stackrel{a}{\oplus}}M)^{\stackrel{b}{\oplus}}P \sim (N^{\stackrel{b}{\oplus}}P)^{\stackrel{a}{\oplus}}(M^{\stackrel{b}{\oplus}}P)^{\stackrel{b}{\oplus}}$$

To orient this as a rewrite rule, we need to give priority to one label over another. Fortunately, the relative position of the associated generators <sup>a</sup> and <sup>b</sup> provides just that. Then to define <sup>p</sup>, we will want every choice to belong to some generator, and make the order of generators explicit.

**Definition 2.** The set fl(N) of free labels of a term N is defined inductively by:

$$\begin{aligned} \mathfrak{fl}(x) &= \emptyset & \mathfrak{fl}(MN) &= \mathfrak{fl}(M) \cup \mathfrak{fl}(N) & \mathfrak{fl}(\lambda x.M) &= \mathfrak{fl}(M), \\ \mathfrak{fl}(\boxed{a}.M) &= \mathfrak{fl}(M) \times \{a\} & \mathfrak{fl}(M^{\underset{\alpha}{\oplus}}N) &= \mathfrak{fl}(M) \cup \mathfrak{fl}(N) \cup \{a\} \end{aligned}$$

A term M is label-closed if fl(M) = <sup>∅</sup>.

**Fig. 1.** Reduction Rules for <sup>β</sup>-reduction and <sup>p</sup>-reduction.

From here on, we consider only label-closed terms (we implicitly assume this, unless otherwise stated). All terms are identified up to renaming of their bound variables and labels. Given some terms M and N and a variable x, M[N/x] is the capture-avoiding (for both variables and labels) substitution of N for the free occurrences of x in M. We speak of a representative M of a term when M is not considered up to such a renaming. A representative M of a term is well-labeled if for every occurrence of <sup>a</sup> in M there is no <sup>a</sup> occurring in its scope.

**Definition 3 (Order for labels).** Let M be a well-labeled representative of a term. We define an order <sup>&</sup>lt;<sup>M</sup> for the labels occurring in <sup>M</sup> as follows: a <<sup>M</sup> <sup>b</sup> if and only if <sup>b</sup> occurs in the scope of <sup>a</sup> .

For a well-labeled and label-closed representative <sup>M</sup>, <sup>&</sup>lt;<sup>M</sup> is a finite tree order.

**Definition 4.** Reduction <sup>=</sup> <sup>β</sup> <sup>∪</sup> <sup>p</sup> in <sup>Λ</sup>PE consists of <sup>β</sup>-reduction <sup>β</sup> and permutative or p-reduction <sup>p</sup>, both defined as the contextual closure of the rules given in Figure 1. We write for the reflexive–transitive closure of , and for reduction to normal form; similarly for <sup>β</sup> and <sup>p</sup>. We write =<sup>p</sup> for the symmetric and reflexive–transitive closure of <sup>p</sup>.

<sup>a</sup> .(λx. x <sup>=</sup> <sup>x</sup>)( <sup>a</sup> <sup>⊕</sup> <sup>⊥</sup>) <sup>p</sup> <sup>a</sup> .(λx. x <sup>=</sup> <sup>x</sup>) <sup>a</sup> <sup>⊕</sup> (λx. x <sup>=</sup> x)⊥ (⊕a) <sup>β</sup> <sup>a</sup> .( <sup>=</sup> ) <sup>a</sup> <sup>⊕</sup> (⊥ <sup>=</sup> ⊥) <sup>=</sup> <sup>a</sup> . <sup>a</sup> <sup>⊕</sup> <sup>p</sup> <sup>a</sup> . <sup>p</sup> (i,-)

**Fig. 2.** Example Reduction of the cbv-translation of the Term on p. . 137

Two example reductions are (1)-(2) on p. ; a third, complete reduction is in Figure 2. The crucial feature of p-reduction is that a choice <sup>a</sup> <sup>⊕</sup> does permute out of the argument position of an application, but a generator <sup>a</sup> does not, as below. Since the argument of a redex may be duplicated, this is how we characterize the difference between the outcome of a probabilistic event, whose duplicates may be identified, and the event itself, whose duplicates may yield different outcomes. 137

$$N\left(M^{\underline{a}}\ominus P\right) \begin{array}{c} \multicolumn{2}{c}{\smile}\_{\mathfrak{p}}\left(NM\right)\stackrel{\mathfrak{a}}{\oplus}\left(NP\right) \end{array} \qquad\qquad N\left(\bigsqcup\_{a}.M\right)\not\twoheadrightarrow\_{\mathfrak{p}}\left(\bigsqcup\_{a}.N\amalg\right)$$

By inspection of the rewrite rules in Figure 1, we can then characterize the normal forms of <sup>p</sup> and as follows.

**Proposition 5 (Normal forms).** The normal forms <sup>P</sup><sup>0</sup> of <sup>p</sup>, respectively <sup>N</sup><sup>0</sup> of , are characterized by the following grammars.

> <sup>P</sup><sup>0</sup> · ·· ·<sup>=</sup> <sup>P</sup><sup>1</sup> <sup>|</sup> <sup>P</sup><sup>0</sup> <sup>⊕</sup> <sup>P</sup> 0 <sup>P</sup><sup>1</sup> · ·· ·<sup>=</sup> <sup>x</sup> <sup>|</sup> λx.P<sup>1</sup> <sup>|</sup> <sup>P</sup><sup>1</sup> <sup>P</sup><sup>0</sup> <sup>N</sup><sup>0</sup> · ·· ·<sup>=</sup> <sup>N</sup><sup>1</sup> <sup>|</sup> <sup>N</sup><sup>0</sup> <sup>⊕</sup> <sup>N</sup> 0 <sup>N</sup><sup>1</sup> · ·· ·<sup>=</sup> <sup>N</sup><sup>2</sup> <sup>|</sup> λx.N<sup>1</sup> <sup>N</sup><sup>2</sup> · ·· ·<sup>=</sup> <sup>x</sup> <sup>|</sup> <sup>N</sup><sup>2</sup> <sup>N</sup><sup>0</sup>

## **3 Properties of Permutative Reduction**

We will prove strong normalization and confluence of <sup>p</sup>. For strong normalization, the obstacle is the interaction between different choice operators, which may duplicate each other, creating super-exponential growth.<sup>3</sup> Fortunately, Dershowitz's recursive path orders [12] seem tailor-made for our situation.

Observe that the set <sup>Λ</sup>PE endowed with <sup>p</sup> is a first-order term rewriting system over a countably infinite set of variables and the signature Σ given by:


<sup>3</sup> This was inferred only from a simple simulation; we would be interested to know a rigorous complexity result.

**Definition 6.** Let M be a well-labeled representative of a label-closed term, and let <sup>Σ</sup><sup>M</sup> be the set of signature symbols occurring in <sup>M</sup>. We define <sup>≺</sup><sup>M</sup> as the (strict) partial order on <sup>Σ</sup><sup>M</sup> generated by the following rules.


**Lemma 7.** The reduction <sup>p</sup> is strongly normalizing.

Proof. For the first-order term rewriting system (ΛPE, <sup>p</sup>) we derive a wellfounded recursive path ordering <sup>&</sup>lt; from <sup>≺</sup><sup>M</sup> following [12, p. 289]. Let <sup>f</sup> and <sup>g</sup> range over function symbols, let [N1,...,N<sup>n</sup>] denote a multiset and extend < to multisets by the standard multiset ordering, and let N <sup>=</sup> f(N1,...,N<sup>n</sup>) and <sup>M</sup> <sup>=</sup> <sup>g</sup>(M1,...,M<sup>m</sup>); then

$$N < M \iff \begin{cases} [N\_1, \dots, N\_n] < [M\_1, \dots, M\_m] & \text{if } f = g \\ [N\_1, \dots, N\_n] < [M] & \text{if } f \prec\_M g \\ [N] \le [M\_1, \dots, M\_m] & \text{if } f \not\succeq\_M g \dots \end{cases}$$

While <sup>≺</sup><sup>M</sup> is defined only relative to <sup>Σ</sup><sup>M</sup>, reduction may only reduce the signature. Inspection of Figure <sup>1</sup> then shows that <sup>M</sup> <sup>p</sup> <sup>N</sup> implies N<M.

**Confluence of Permutative Reduction.** With strong normalization, confluence of <sup>p</sup> requires only local confluence. We reduce the number of cases to consider, by casting the permutations of <sup>a</sup> <sup>⊕</sup> as instances of a common shape.

**Definition 8.** We define a context C[ ] (with exactly one hole [ ]) as follows, and let C[N] represent C[ ] with the hole [ ] replaced by N.

C[ ] · ·· ·=[] <sup>|</sup> λx.C[ ] <sup>|</sup> C[ ]M <sup>|</sup> NC[ ] <sup>|</sup> C[ ] <sup>a</sup> <sup>⊕</sup>M <sup>|</sup> N <sup>a</sup> <sup>⊕</sup>C[ ] <sup>|</sup> <sup>a</sup> . C[ ]

Observe that the six reduction rules <sup>⊕</sup>λ through <sup>⊕</sup> in Figure 1 are all of the following form. We refer to these collectively as <sup>⊕</sup>.

$$C[N \stackrel{a}{\oplus} M] \rightharpoonup\_{\mathsf{p}} C[N] \stackrel{a}{\oplus} C[M] \tag{\oplus} \star$$

**Lemma 9 (Confluence of** <sup>p</sup>**).** Reduction <sup>p</sup> is confluent.

Proof. By Newman's lemma and strong normalization of <sup>p</sup> (Lemma 7), confluence follows from local confluence. The proof of local confluence consists of joining all critical pairs given by <sup>p</sup>. Details are in the Appendix of [7].

**Definition 10.** We denote the unique <sup>p</sup>-normal form of a term N by N<sup>p</sup>.

## **4 Confluence**

We aim to prove that = <sup>β</sup> ∪ <sup>p</sup> is confluent. We will use the standard technique of parallel β-reduction [26], a simultaneous reduction step on a number of β-redexes, which we define via a labeling of the redexes to be reduced. The central point is to find a notion of reduction that is diamond, i.e. every critical pair can be closed in one (or zero) steps. This will be our complete reduction, which consists of parallel β-reduction followed by <sup>p</sup>-reduction to normal form.

**Definition 11.** <sup>A</sup> labeled term P• is a term <sup>P</sup> with chosen <sup>β</sup>-redexes annotated as (λx. N)•M. The unique labeled <sup>β</sup>-step <sup>P</sup>• <sup>β</sup> <sup>P</sup>• from <sup>P</sup>• to the labeled reduct <sup>P</sup>• reduces every labeled redex, and is defined inductively as follows.

$$\begin{aligned} (\lambda x.N^{\bullet})^{\bullet}M^{\bullet} & \Rightarrow\_{\beta}N\_{\bullet}[M\_{\bullet}/x] \\ x & \Rightarrow\_{\beta}x \\ \lambda x.N^{\bullet} & \rightarrow\_{\beta}\lambda x.N\_{\bullet} \end{aligned} \qquad\qquad \begin{aligned} N^{\bullet}M^{\bullet} & \Rightarrow\_{\beta}N\_{\bullet}M\_{\bullet} \\ N^{\bullet} \stackrel{a}{\oplus}M^{\bullet} & \Rightarrow\_{\beta}N\_{\bullet} \stackrel{a}{\oplus}M\_{\bullet} \\ \Box\_{\bullet}N^{\bullet} & \rightarrow\_{\beta}\Box\_{\bullet}N\_{\bullet} \end{aligned}$$

<sup>A</sup> parallel <sup>β</sup>-step <sup>P</sup> <sup>β</sup> <sup>P</sup>• is a labeled step <sup>P</sup>• <sup>β</sup> <sup>P</sup>• for some labeling <sup>P</sup>•.

Note that <sup>P</sup>• is an unlabeled term, since all labels are removed in the reduction. For the empty labeling, <sup>P</sup>• <sup>=</sup> <sup>P</sup>• <sup>=</sup> <sup>P</sup>, so parallel reduction is reflexive: <sup>P</sup> <sup>β</sup> <sup>P</sup>.

**Lemma 12.** A parallel <sup>β</sup>-step <sup>P</sup> <sup>β</sup> <sup>P</sup>• is a <sup>β</sup>-reduction <sup>P</sup> <sup>β</sup> <sup>P</sup>•.

Proof. By induction on the labeled term <sup>P</sup>• generating <sup>P</sup> <sup>β</sup> <sup>P</sup>•.

**Lemma 13.** Parallel β-reduction is diamond.

Proof. Let <sup>P</sup>• <sup>β</sup> <sup>P</sup>• and <sup>P</sup>◦ <sup>β</sup> <sup>P</sup>◦ be two labeled reduction steps on a term P. We annotate each step with the label of the other, preserved by reduction, to give the span from the doubly labeled term P•◦ <sup>=</sup> <sup>P</sup>◦• below left. Reducing the remaining labels will close the diagram, as below right.

$$P^{\circ}\_{\bullet} \; \; \_{\beta} \leftarrow \; \_{P} \mathcal{P}^{\bullet \circ} = P^{\circ \bullet} \; \; \rightarrow \ast \_{\beta} \; \; P^{\bullet}\_{\circ} \qquad \qquad \qquad \begin{array}{ccc} P^{\circ}\_{\bullet} & \rightarrow \_{\beta} \; \; \; \; P\_{\bullet \circ} = P\_{\circ \bullet} \; \; \_{\beta} \leftarrow \; \; \; \; \; P^{\bullet}\_{\circ \circ} \end{array}$$

This is proved by induction on P•◦, where only two cases are not immediate: those where a redex carries one but not the other label. One case follows by the below diagram; the other case is symmetric. Below, for the step top right, induction on <sup>N</sup>• shows that <sup>N</sup>•[M•/x] <sup>β</sup> <sup>N</sup>•[M•/x].

$$\begin{aligned} (\lambda x.N^{\bullet \bullet})^{\circ}M^{\circ \bullet} & \rightsquigarrow \quad N\_{\circ}^{\bullet}[M\_{\circ}^{\bullet}/x] & \rightharpoonup\_{\beta}N\_{\circ \bullet}[M\_{\circ \bullet}/x] \\ &= \\ (\lambda x.N^{\bullet \bullet})^{\circ}M^{\bullet \bullet} & \rightharpoonup\_{\beta} (\lambda x.N^{\circ}\_{\bullet})^{\circ}M^{\circ}\_{\bullet} & \rightharpoonup\_{\beta}N\_{\bullet \circ}[M\_{\bullet \circ}/x] & \rightharpoonup\_{\beta} \end{aligned}$$

#### **4.1 Parallel Reduction and Permutative Reduction**

For the commutation of (parallel) β-reduction with <sup>p</sup>-reduction, we run into the minor issue that a permuting generator or choice operator may block a redex: in both cases below, before <sup>p</sup> the term has a redex, but after <sup>p</sup> it is blocked.

$$\left(\left(\lambda x.N \stackrel{a}{\oplus} M\right)P \multimap\_{\mathfrak{p}}\left(\left(\lambda x.N\right) \stackrel{a}{\oplus} \left(\lambda x.M\right)\right)P\right) \qquad \left(\lambda x.\big(\varDelta\big).N\right)M \multimap\_{\mathfrak{p}}\left(\left(\overline{a}\right).\lambda x.N\right)M$$

We address this by an adaptation <sup>p</sup> of p-reduction on labeled terms, which is a strategy in <sup>p</sup> that permutes past a labeled redex in one step.

**Definition 14.** <sup>A</sup> labeled <sup>p</sup>-reduction <sup>N</sup>• <sup>p</sup> <sup>M</sup>• on labeled terms is a <sup>p</sup>reduction of one of the forms

$$(\lambda x.N^{\bullet} \stackrel{a}{\oplus} M^{\bullet})^{\bullet} P^{\bullet} \dashv \mathfrak{p}\_{\mathfrak{p}} (\lambda x.N^{\bullet})^{\bullet} P^{\bullet} \stackrel{a}{\oplus} (\lambda x.M^{\bullet})^{\bullet} P^{\bullet}$$

$$(\lambda x.\boxed{\Box}.N^{\bullet})^{\bullet} M^{\bullet} \dashv \mathfrak{p}\_{\mathfrak{p}} \varBox \end{pmatrix}$$

$$(\lambda x.\boxed{\Box}.N^{\bullet})^{\bullet} M^{\bullet} \dashv \mathfrak{p}\_{\mathfrak{p}} \varBox \overline{\Box}.(\lambda x.N^{\bullet})^{\bullet} M^{\bullet}$$

or a single <sup>p</sup>-step <sup>p</sup> on unlabeled constructors in <sup>N</sup>•.

**Lemma 15.** Reduction to normal form in <sup>p</sup> is equal to <sup>p</sup> (on labeled terms).

Proof. Observe that <sup>p</sup> and <sup>p</sup> have the same normal forms. Then in one direction, since <sup>p</sup> <sup>⊆</sup> <sup>p</sup> we have <sup>p</sup> <sup>⊆</sup> <sup>p</sup>. Conversely, let <sup>N</sup> <sup>p</sup> <sup>M</sup>. On this reduction, let <sup>P</sup> <sup>p</sup> <sup>Q</sup> be the first step such that <sup>P</sup> <sup>p</sup> <sup>Q</sup>. Then there is an <sup>R</sup> such that <sup>P</sup> <sup>p</sup> <sup>R</sup> and <sup>Q</sup> <sup>p</sup> <sup>R</sup>. Note that we have <sup>N</sup> <sup>p</sup> <sup>R</sup>. By confluence, <sup>R</sup> <sup>p</sup> <sup>M</sup>, and by induction on the sum length of paths in <sup>p</sup> from <sup>R</sup> (smaller than from <sup>N</sup>) we have <sup>R</sup> <sup>p</sup> <sup>M</sup>, and hence <sup>N</sup> <sup>p</sup> <sup>M</sup>.

The following lemmata then give the required commutation properties of the relations <sup>p</sup>, <sup>p</sup>, and <sup>β</sup>. Figure 3 illustrates these by commuting diagrams.

## **Lemma 16.** If <sup>N</sup>• <sup>p</sup> <sup>M</sup>• then <sup>N</sup>• <sup>=</sup><sup>p</sup> <sup>M</sup>•.

Proof. By induction on the rewrite step <sup>p</sup>. The two interesting cases are:

$$\begin{array}{c} (\lambda x.M^{\bullet})^{\bullet}(N^{\bullet} \xrightarrow{a} P^{\bullet}) \xrightarrow{\mathsf{p}} ((\lambda x.M^{\bullet})^{\bullet}N^{\bullet})^{\frac{a}{\bullet}}((\lambda x.M^{\bullet})^{\bullet}P^{\bullet})\\ \beta \downarrow & \stackrel{\circ}{\downarrow} \beta & (x \in \mathsf{fv}(M))\\ M\_{\bullet}[(N\_{\bullet} \stackrel{a}{\oplus} P\_{\bullet})/x] \xrightarrow{\cdot} \cdots \xrightarrow{\cdot} \cdots \, \mathsf{w} \, M\_{\bullet}[N\_{\bullet}/x]^{\frac{a}{\bullet}}M\_{\bullet}[P\_{\bullet}/x] \end{array}$$

$$\begin{pmatrix} (\lambda x.M^{\bullet})^{\bullet}(N^{\bullet} \xleftarrow{a} P^{\bullet}) \xrightarrow{\mathsf{p}} ((\lambda x.M^{\bullet})^{\bullet}N^{\bullet}) \xleftarrow{a} ((\lambda x.M^{\bullet})^{\bullet}P^{\bullet})\\ \beta \Big\|\limits\_{\begin{subarray}{c} \mathsf{p} \ \mathsf{q} \end{subarray}} & \begin{smallmatrix} \stackrel{\mathsf{q}}{\mathsf{p}} \beta & (x \notin \mathsf{fv}(M))\\ \mathsf{q} & \mathsf{p} \end{smallmatrix} \\\ M\_{\bullet} \mathsf{q} \leftarrow \mathsf{q} \qquad \qquad \qquad \qquad \qquad \mathsf{p}\_{\bullet} \stackrel{\mathsf{q}}{\ominus} M\_{\bullet} \end{pmatrix}$$

How the critical pairs in the above diagrams are joined shows that we cannot use the Hindley-Rosen Lemma [2, Prop. 3.3.5] to prove confluence of <sup>β</sup> ∪ <sup>p</sup>.

**Lemma 17.** <sup>N</sup>• <sup>=</sup><sup>p</sup> <sup>N</sup>p•.

Proof. Using Lemma <sup>15</sup> we decompose <sup>N</sup>• <sup>p</sup> <sup>N</sup>• <sup>p</sup> as

$$N^\bullet = N\_1^\bullet \rightharpoonup\_\mathsf{p} N\_2^\bullet \rightharpoonup\_\mathsf{p} \cdots \rightharpoonup\_\mathsf{p} N\_n^\bullet = N\_{\mathsf{p}}^\bullet$$

where (N<sup>i</sup>)• <sup>=</sup><sup>p</sup> (N<sup>i</sup>+1)• by Lemma 16.

#### **4.2 Complete Reduction**

To obtain a reduction strategy with the diamond property for , we combine parallel reduction <sup>β</sup> with permutative reduction to normal form <sup>p</sup> into a notion of complete reduction . We will show that it is diamond (Lemma 19), and that any step in maps onto a complete step of p-normal forms (Lemma 20). Confluence of (Theorem 21) then follows: any two paths map onto complete paths on p-normal forms, which then converge by the diamond property.

**Definition 18.** <sup>A</sup> complete reduction step <sup>N</sup> <sup>N</sup>•<sup>p</sup> is a parallel <sup>β</sup>-step followed by p-reduction to normal form:

$$N \rightharpoonup N\_{\bullet \spadesuit} \quad := \quad N \rightharpoonup\_{\beta} N\_{\bullet} \rightharpoonup\_{\bullet} N\_{\bullet \spadesuit} \dots$$

**Lemma 19 (Complete reduction is diamond).** If P <sup>N</sup> <sup>M</sup> then for some Q, P <sup>Q</sup> <sup>M</sup>.

Proof. By the following diagram, where <sup>M</sup> <sup>=</sup> <sup>N</sup>◦<sup>p</sup> and <sup>P</sup> <sup>=</sup> <sup>N</sup>•<sup>p</sup>, and <sup>Q</sup> <sup>=</sup> <sup>N</sup>◦•<sup>p</sup>. The square top left is by Lemma 13, top right and bottom left are by Lemma 17, and bottom right is by confluence and strong normalization of p-reduction.

$$\begin{array}{c} N^{\circ \bullet} \xrightarrow{\beta} \begin{array}{c} N\_{\circ}^{\bullet} \xrightarrow{\beta} \begin{array}{c} N\_{\circ}^{\bullet} \xrightarrow{\mathsf{p}} \multimap N\_{\circ \bullet}^{\bullet} \\ \beta \end{array} \\ \beta \end{array} \\ N\_{\bullet}^{\circ} \xrightarrow{\beta} \begin{array}{c} \beta \\ \longrightarrow \end{array} N\_{\circ \bullet} \quad =\_{\mathsf{P}} \quad N\_{\circ \bullet \bullet} \\ \mathsf{p} \Big| \\ N\_{\bullet \bullet}^{\circ} \xrightarrow{\beta} N\_{\bullet \bullet \bullet} \xrightarrow{\mathsf{p}} N\_{\bullet \bullet \bullet} \end{array}$$

**Lemma 20 (**p**-Normalization maps reduction to complete reduction).** If <sup>N</sup> <sup>M</sup> then <sup>N</sup><sup>p</sup> <sup>M</sup><sup>p</sup>.

Proof. For a <sup>p</sup>-step <sup>N</sup> <sup>p</sup> <sup>M</sup> we have <sup>N</sup><sup>p</sup> <sup>=</sup> <sup>M</sup><sup>p</sup> while <sup>β</sup> is reflexive. For a <sup>β</sup>-step <sup>N</sup> <sup>β</sup> <sup>M</sup> we label the reduced redex in <sup>N</sup> to get <sup>N</sup>• <sup>β</sup> <sup>N</sup>• <sup>=</sup> <sup>M</sup>. Then Lemma <sup>17</sup> gives <sup>N</sup><sup>p</sup>• <sup>=</sup><sup>p</sup> <sup>M</sup>, and hence <sup>N</sup><sup>p</sup> <sup>β</sup> <sup>N</sup><sup>p</sup>• <sup>p</sup> <sup>M</sup><sup>p</sup>.

**Fig. 3.** Diagrams for the Lemmata Leading up to Confluence

#### **Theorem 21.** Reduction is confluent.

Proof. By the following diagram. For the top and left areas, by Lemma 20 any reduction path <sup>N</sup> <sup>M</sup> maps onto one <sup>N</sup><sup>p</sup> <sup>M</sup><sup>p</sup>. The main square follows by the diamond property of complete reduction, Lemma 19.

**5 Strong Normalization for Simply-Typed Terms**

In this section, we prove that the relation enjoys strong normalization in simply typed terms. Our proof of strong normalization is based on the classic reducibility technique, and inherently has to deal with label-open terms. It thus make great sense to turn the order <sup>&</sup>lt;<sup>M</sup> from Definition <sup>3</sup> into something more formal, at the same time allowing terms to be label-open. This is in Figure 4. It is easy to realize that, of course modulo label α-equivalence, for every term <sup>M</sup> there is at least one <sup>θ</sup> such that <sup>θ</sup> <sup>L</sup> <sup>M</sup>. An easy fact to check is that if <sup>θ</sup> <sup>L</sup> <sup>M</sup> and <sup>M</sup> <sup>N</sup>, then <sup>θ</sup> <sup>L</sup> <sup>N</sup>. It thus makes sense to parametrize on a sequence of labels θ, i.e., one can define a family of reduction relations <sup>θ</sup> on pairs in the form (M,θ). The set of strongly normalizable terms, and the number of steps to normal forms become themselves parametric:



**Fig. 4.** Labeling Terms


**Fig. 5.** Types, Environments, Judgments, and Rules


**Fig. 6.** Closure Rules for Sets SN <sup>θ</sup>

We can now define types, environments, judgments, and typing rules in Figure 5.

Please notice that the type structure is precisely the one of the usual, vanilla, simply-typed λ-calculus (although terms are of course different), and we can thus reuse most of the usual proof of strong normalization, for example in the version given by Ralph Loader's notes [21], page 17.

**Lemma 22.** The closure rules in Figure 6 are all sound.

Since the structure of the type system is the one of plain, simple types, the definition of reducibility sets is the classic one:

$$\begin{aligned} \operatorname{Red}\_{\alpha} &= \{ (\Gamma, \theta, M) \mid M \in \operatorname{SN}^{\theta} \land \Gamma \vdash M : \alpha \}; \\ \operatorname{Red}\_{\tau \Rightarrow \rho} &= \{ (\Gamma, \theta, M) \mid (\Gamma \vdash M : \tau \Rightarrow \rho) \land (\theta \vdash\_{L} M) \land \\ &\quad \forall (\Gamma \Delta, \theta, N) \in \operatorname{Red}\_{\tau} . (\Gamma \Delta, \theta, M N) \in \operatorname{Red}\_{\rho} \}. \end{aligned}$$

Before proving that all terms are reducible, we need some auxiliary results.

**Lemma 23.** 1. If (Γ, θ,M) <sup>∈</sup> Red <sup>τ</sup> , then <sup>M</sup> <sup>∈</sup> SN <sup>θ</sup>.


Proof. The proof is an induction on τ : If τ is an atom α, then Point <sup>1</sup> follows by definition, while points <sup>2</sup> to <sup>5</sup> come from Lemma 22. If τ is ρ <sup>⇒</sup> μ, Points <sup>2</sup> to 5 come directly from the induction hypothesis, while Point 1 can be proved by observing that <sup>M</sup> is in SN <sup>θ</sup> if Mx is itself SN <sup>θ</sup>, where x is a fresh variable. By induction hypothesis (on Point 2), we can say that (Γ(x : ρ), θ, x) <sup>∈</sup> Red <sup>ρ</sup>, and conclude that (Γ(x : ρ), θ,Mx) <sup>∈</sup> Redμ.

The following is the so-called Main Lemma:

**Proposition 24.** Suppose <sup>y</sup><sup>1</sup> : <sup>τ</sup>1,...,y<sup>n</sup> : <sup>τ</sup><sup>n</sup> <sup>M</sup> : <sup>ρ</sup> and <sup>θ</sup> <sup>L</sup> <sup>M</sup>, with (Γ, θ, N<sup>j</sup> ) <sup>∈</sup> Red <sup>τ</sup><sup>j</sup> for all <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>n</sup>. Then (Γ, θ,M[N1/y1,...,N<sup>n</sup>/y<sup>n</sup>]) <sup>∈</sup> Red <sup>ρ</sup>.

Proof. This is an induction on the structure of the term M:


$$(I, \theta, L[\overline{N}/\overline{y}]) \in \operatorname{Red}\_{\xi \Rightarrow \rho} \qquad (I, \theta, P[\overline{N}/\overline{y}]) \in \operatorname{Red}\_{\xi} \quad \Box$$

By definition, we get

$$(I, \theta, (LP)[\overline{N}/\overline{y}]) \in \operatorname{Red}\_{\rho}\dots$$

• If <sup>M</sup> is an abstraction λx. L, then <sup>ρ</sup> is an arrow type <sup>ξ</sup> <sup>⇒</sup> <sup>μ</sup> and <sup>y</sup><sup>1</sup> : <sup>τ</sup><sup>1</sup>,...,y<sup>n</sup> : <sup>τ</sup><sup>n</sup>, x : <sup>ξ</sup> <sup>L</sup> : <sup>μ</sup>. Now, consider any (Γ Δ, θ, P) <sup>∈</sup> Red <sup>ξ</sup>. Our objective is to prove with this hypothesis that (Γ Δ, θ,(λx.L[N /y])P) <sup>∈</sup> Redμ. By induction hypothesis, since (Γ Δ, N<sup>i</sup>) <sup>∈</sup> Red <sup>τ</sup><sup>i</sup> , we get that (Γ Δ, θ, L[N /y, P/x]) <sup>∈</sup> Redμ. The thesis follows from Lemma 23.


We now have all the ingredients for our proof of strong normalization:

**Theorem 25.** If <sup>Γ</sup> <sup>M</sup> : <sup>τ</sup> and <sup>θ</sup> <sup>L</sup> <sup>M</sup>, then <sup>M</sup> <sup>∈</sup> SN <sup>θ</sup>.

Proof. Suppose that <sup>x</sup><sup>1</sup> : <sup>ρ</sup>1,...,x<sup>n</sup> : <sup>ρ</sup><sup>n</sup> <sup>M</sup> : <sup>τ</sup> . Since <sup>x</sup><sup>1</sup> : <sup>ρ</sup>1,...,x<sup>n</sup> : <sup>ρ</sup><sup>n</sup> <sup>x</sup><sup>i</sup> : <sup>ρ</sup><sup>i</sup> for all <sup>i</sup>, and clearly <sup>θ</sup> <sup>L</sup> <sup>x</sup><sup>i</sup> for every <sup>i</sup>, we can apply Lemma <sup>24</sup> and obtain that (Γ, θ,M[x/x]) <sup>∈</sup> Red <sup>τ</sup> from which, via Lemma 23, one gets the thesis.

## **6 Projective Reduction**

Permutative reduction <sup>p</sup> evaluates probabilistic sums purely by rewriting. Here we look at a more standard projective notion of reduction, which conforms more closely to the intuition that <sup>a</sup> generates a probabilistic event to determine the choice <sup>a</sup> <sup>⊕</sup>. Using + for an external probabilistic sum, we expect to reduce <sup>a</sup> . N to N0+N<sup>1</sup> where each N<sup>i</sup> is obtained from N by projecting every subterm M<sup>0</sup> a <sup>⊕</sup>M<sup>1</sup> to Mi. The question is, in what context should we admit this reduction? We first limit ourselves to reducing in head position.

**Definition 26.** The a-projections π<sup>a</sup> <sup>0</sup> (N) and <sup>π</sup><sup>a</sup> <sup>1</sup> (N) are defined as follows:

$$\begin{aligned} \pi\_0^a(N \stackrel{a}{\oplus} M) &= \pi\_0^a(N) & \pi\_i^a(\lambda x.N) &= \lambda x.\pi\_i^a(N) \\ \pi\_1^a(N \stackrel{a}{\oplus} M) &= \pi\_1^a(M) & \pi\_i^a(NM) &= \pi\_i^a(N)\,\pi\_i^a(M) \\ \pi\_i^a(\boxed{a}.N) &= \boxed{a}.N & \pi\_i^a(N \stackrel{b}{\oplus} M) &= \pi\_i^a(N) \stackrel{b}{\oplus} \pi\_i^a(M) & \text{if } a \neq b \\ \pi\_i^a(x) &= x & \pi\_i^a(\boxed{b}.N) &= \boxed{b}.\pi\_i^a(N) & \text{if } a \neq b. \end{aligned}$$

**Definition 27.** <sup>A</sup> head context H[ ] is given by the following grammar.

$$H[\ ] ::= [ ] \ | \ \ \lambda x. H[ ] \ | \ \ | \ H[ ] ] N$$

**Definition 28.** Projective head reduction <sup>π</sup><sup>h</sup> is given by

$$H[\boxed{a}.N] \rightharpoonup \spadesuit\_{\pi\hbar} H[\pi\_0^a(N)] + H[\pi\_1^a(N)]\ .$$

We can simulate <sup>π</sup><sup>h</sup> by permutative reduction if we interpret the external sum + by an outermost <sup>⊕</sup> (taking special care if the label does not occur).

**Proposition 29.** Permutative reduction simulates projective head reduction:

$$H[\boxed{a}.N] \quad \mathsf{\twoheadrightarrow}\_{\mathsf{p}} \quad \begin{cases} H[N] & \text{if } a \notin \mathsf{fl}(N) \\ H[\pi\_0^a(N)] \oplus H[\pi\_1^a(N)] & \text{otherwise.} \end{cases}$$

Proof. The case a /<sup>∈</sup> fl(N) is immediate by a step. For the other case, observe that <sup>H</sup>[ <sup>a</sup> . N] <sup>p</sup> <sup>a</sup> . H[N] by λ and <sup>f</sup> steps, and since a does not occur in H[ ], that H[π<sup>a</sup> <sup>i</sup> (N)] = <sup>π</sup><sup>a</sup> <sup>i</sup> (H[N]). By induction on <sup>N</sup>, if <sup>a</sup> is minimal in <sup>N</sup> (i.e. <sup>a</sup> <sup>∈</sup> fl(N) and <sup>a</sup> <sup>≤</sup> <sup>b</sup> for all <sup>b</sup> <sup>∈</sup> fl(N)) then <sup>N</sup> <sup>p</sup> <sup>π</sup><sup>a</sup> <sup>0</sup> (N) <sup>a</sup> ⊕πa <sup>1</sup> (N). As required,

$$H[\varprojlim N] \quad \mathsf{\star\upvarphi \square}\_{\mathsf{p}} \quad \underline{\mathsf{T}^{a}}.\ H[\pi\_{0}^{a}(N)] \stackrel{a}{\oplus} H[\pi\_{1}^{a}(N)] \quad \text{if } a \in \mathsf{fl}(N)\ . \tag{7}$$

A gap remains between which generators will not be duplicated, which we should be able to reduce, and which generators projective head reduction does reduce. In particular, to interpret call-by-value probabilistic reduction in Section 7, we would like to reduce under other generators. However, permutative reduction does not permit exchanging generators, and so only simulates reducing in head position. While (independent) probabilistic events are generally considered interchangeable, it is a question whether the below equivalence is desirable.

$$
\begin{array}{ccc}
\boxed{a}.\boxed{b}.N & \end{array} \begin{array}{ccc}
\sim & \boxed{b}.\boxed{a}.N \end{array} \tag{4}
$$

We elide the issue by externalizing probabilistic events, and reducing with reference to a predetermined binary stream s ∈ {0, <sup>1</sup>}<sup>N</sup> representing their outcomes. In this way, we will preserve the intuitions of both permutative and projective reduction: we obtain a qualified version of the equivalence (4) (see (5) below), and will be able to reduce any generator on the spine of a term: under (other) generators and choices as well as under abstractions and in function position.

**Definition 30.** The set of streams is <sup>S</sup> <sup>=</sup> {0, <sup>1</sup>}<sup>N</sup>, ranged over by r, s, t, and <sup>i</sup>· <sup>s</sup> denotes a stream with i ∈ {0, <sup>1</sup>} as first element and s as the remainder.

**Definition 31.** The stream labeling N<sup>s</sup> of a term <sup>N</sup> with a stream <sup>s</sup> <sup>∈</sup> <sup>S</sup>, which annotates generators as <sup>a</sup> <sup>i</sup> with <sup>i</sup> ∈ {0, <sup>1</sup>} and variables as <sup>x</sup><sup>s</sup> with a stream s, is given inductively below. We lift β-reduction to stream-labeled terms by introducing a substitution case for stream-labeled variables: x<sup>s</sup>[M/x] = <sup>M</sup><sup>s</sup>.

$$(\lambda x.\,N)^{s} = \lambda x.\,N^{s} \qquad\qquad\qquad\qquad\qquad\square\_{a}^{\square}.N^{i} = \square\_{a}^{\square}.N^{s}$$

$$(\boldsymbol{N}\,M)^{s} = \boldsymbol{N}^{s}\,M \qquad\qquad\qquad\qquad\square\_{\oplus}^{\square}.M^{s} = \boldsymbol{N}^{s}\,\overset{a}{\oplus}M^{s}$$

**Definition 32.** Projective reduction <sup>π</sup> on stream-labeled terms is the rewrite relation given by

$$
\begin{array}{ccc}
\boxed{a}^i.N & \rightarrow\_\pi & \pi\_i^a(N)\ .
\end{array}
$$

Observe that in N<sup>s</sup> a generator that occurs under <sup>n</sup> other generators on the spine of N, is labeled with the element of s at position n + 1. Generators in argument position remain unlabeled, until a β-step places them on the spine, in which case they become labeled by the new substitution case. We allow to annotate a term with a finite prefix of a stream, e.g. <sup>N</sup><sup>i</sup> with a singleton i, so that only part of the spine is labeled. Subsequent labeling of a partly labeled term is then by (N<sup>r</sup>)<sup>s</sup> <sup>=</sup> N<sup>r</sup>·<sup>s</sup> (abusing notation). To introduce streams via the external probabilistic sum, and to ignore an unused remaining stream after completing a probabilistic computation, we adopt the following equation.

$$N = N^0 + N^1$$

**Proposition 33.** Projective reduction generalizes projective head reduction:

$$H[\boxed{\underline{a}}.N] \ = \ H[\boxed{\underline{a}}^0.N] + H[\boxed{\underline{a}}^1.N] \ \ \ \ \ \ \ \star\_\pi \ H[\pi\_0^a(N)] + H[\pi\_1^a(N)] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \end{aligned}$$

Returning to the interchangeability of probabilistic events, we refine (4) by exchanging the corresponding elements of the annotating streams:

$$\begin{array}{rcl} (\boxed{a}.\boxed{b}.N)^{i\cdot j\cdot s} &=& \boxed{a}^i.\boxed{b}.N^s \stackrel{\pi}{\longrightarrow} \blackbullet{\pi}^a\_i(\pi^b\_j(N^s)) \\ &\sim \\ (\boxed{b}.\boxed{a}.N)^{j\cdot i\cdot s} &=& \boxed{b}.\boxed{a}.N^s \stackrel{\pi}{\longrightarrow} \blackbullet{\pi}^b\_j(\pi^a\_i(N^s)) \end{array} \tag{5}$$

Stream-labeling externalizes all probabilities, making reduction deterministic. This is expressed by the following proposition, that stream-labeling commutes with reduction: if a generator remains unlabeled in M and becomes labeled after a reduction step M <sup>N</sup>, what label it receives is predetermined. The deep reason is that stream labeling assigns an outcome to each generator in a way that corresponds to a call-by-name strategy for probabilistic reduction.

**Proposition 34.** If M <sup>N</sup> by a step other than then M<sup>s</sup> N<sup>s</sup>.

**Remark 35.** The statement is false for the rule <sup>a</sup> . N <sup>p</sup> <sup>N</sup> (a /<sup>∈</sup> fl(N)), as it removes a generator but not an element from the stream. Arguably, for this reason the rule should be excluded from the calculus. On the other hand, the rule is necessary to implement idempotence of <sup>⊕</sup> , rather than just <sup>a</sup> <sup>⊕</sup>, as follows.

> N <sup>⊕</sup> <sup>N</sup> <sup>=</sup> <sup>a</sup> . N <sup>a</sup> <sup>⊕</sup><sup>N</sup> <sup>p</sup> <sup>a</sup> . N <sup>p</sup> <sup>N</sup> where a /<sup>∈</sup> fl(N)

The below proposition then expresses that projective reduction is an invariant for permutative reduction. If <sup>N</sup> <sup>p</sup> <sup>M</sup> by a step (that is not -) on a labeled generator <sup>a</sup> <sup>i</sup> or a corresponding choice <sup>a</sup> <sup>⊕</sup>, then N and M reduce to a common term, <sup>N</sup> <sup>π</sup> <sup>P</sup> <sup>π</sup> <sup>M</sup>, by the projective steps evaluating <sup>a</sup> <sup>i</sup> .

**Proposition 36.** Projective reduction is an invariant for permutative reduction, as follows (with a case for <sup>c</sup><sup>2</sup> symmetric to <sup>c</sup>1, and where <sup>D</sup>[ ] is a context).

$$\begin{array}{c} \square^{\mathsf{i}}.C\left[N^{\oplus}\mathrel{\mathop{\mathsf{a}}}\ N\right] \xrightarrow{\mathsf{p}} \begin{array}{c} \square^{\mathsf{i}}.C\left[N\right]{\vartri}\ \math{\mathsf{a}\sim}{\mathsf{a}}.C\left[N\right]{\urcorner}\end{array}\qquad\square^{\mathsf{i}}.C[\left(N\_{0}\stackrel{\mathsf{a}}{\rightsquig}\ M\right)\stackrel{\mathsf{a}}{\rightsquig}\ \square^{\mathsf{i}}.C[N\_{0}\stackrel{\mathsf{a}}{\rightsquig}N\_{1}]\ \rightsquigarrow\ \star\ \star\ \begin{array}{c} \square^{\mathsf{i}}.C\left[N\_{0}\stackrel{\mathsf{a}}{\rightsquig}N\_{1}\right]\ \rightsquigarrow\ \star\ \neg\ \square^{\mathsf{i}}.C[N\_{0}\stackrel{\mathsf{a}}{\rightsquig}N\_{1}]\ \rightsquigarrow\ \star\ \neg\ \square^{\mathsf{i}}.C[N\_{1}\stackrel{\mathsf{a}}{\rightsquig}N\_{1}]\ \rightsquigarrow\ \star\ \neg\ \square^{\mathsf{i}}.C[N\_{0}\stackrel{\mathsf{a}}{\rightsquig}N\_{1}]\ \rightsquigarrow\ \star\ \neg\ \square^{\mathsf{i}}.C[N\_{0}\stackrel{\mathsf{a}}{\rightsquig}N\_{1}]\ \rightsquigarrow\ \star\ \neg\ \square^{\mathsf{i}}.C[N\_{0}\stackrel{\mathsf{a}}{\rightsquig}N\_{1}]\ \rightsquigarrow\ \star\ \neg\ \square^{\mathsf{i}}.C[N\_{0}\stackrel{\mathsf{a}}{\rightsquig}N\_{1}]\ \rightsquigarrow\ \star\ \neg\ \square^{\mathsf{i}}.C[N\_{0}\stackrel{\mathsf{a}}{\rightsquig}N\_{1}]\ \rightsquigarrow\ \star\ \neg\ \square^{\mathsf{i}}.C[N\_{0}\stackrel{\mathsf{a}}{$$

$$\begin{array}{ccccc}\lambda x.\Box^i.N\stackrel{\mathsf{p}}{\longrightarrow}\Box^i.\lambda x.N & \qquad (\Box^i.N)M\stackrel{\mathsf{p}}{\longrightarrow}\Box^i.NM\\\pi\_\*\downarrow&&\Box\lambda&\downarrow^{\pi}&\qquad\pi\_\*\downarrow&&\Box^i.\end{array}$$

$$\begin{array}{ccccc}\lambda x.\pi\_i^a(N)&=&\pi\_i^a(\lambda x.N)&&&&\pi\_i^a(N)\,M&=&\pi\_i^a(NM)\,M\end{array}$$

## **7 Call-by-value Interpretation**

We consider the interpretation of a call-by-value probabilistic λ-calculus. For simplicity we will allow duplicating (or deleting) β-redexes, and only restrict duplicating probabilities; our values V are then just deterministic—i.e. without choices—terms, possibly applications and not necessarily β-normal (so that our <sup>β</sup><sup>v</sup> is actually <sup>β</sup>-reduction on deterministic terms, unlike [9]). We evaluate the internal probabilistic choice <sup>⊕</sup><sup>v</sup> to an external probabilistic choice +.

$$\begin{array}{c} N ::= x \mid \lambda x.N \mid MN \mid M \oplus\_{\forall} N \qquad\qquad (\lambda x.N) \\\\ V, W ::= x \mid \lambda x.V \mid VW \qquad\qquad \qquad M \oplus\_{\forall} N \multimap\_{\forall} M + N \end{array}$$

The interpretation -<sup>N</sup><sup>v</sup> of a call-by-value term <sup>N</sup> into <sup>Λ</sup>PE is given as follows. First, we translate N to a label-open term -<sup>N</sup>open <sup>=</sup> <sup>θ</sup> <sup>L</sup> <sup>P</sup> by replacing each choice <sup>⊕</sup><sup>v</sup> with one <sup>a</sup> <sup>⊕</sup> with a unique label, where the label-context θ collects the labels used. Then -<sup>N</sup><sup>v</sup> is the label closure -<sup>N</sup><sup>v</sup> <sup>=</sup> <sup>θ</sup> <sup>L</sup> <sup>P</sup>, which prefixes <sup>P</sup> with a generator <sup>a</sup> for every <sup>a</sup> in <sup>θ</sup>.

**Definition 37.** (Call-by-value interpretation) The open interpretation -Nopen of a call-by-value term N is as follows, where all labels are fresh, and inductively -<sup>N</sup><sup>i</sup>open <sup>=</sup> <sup>θ</sup><sup>i</sup> <sup>L</sup> <sup>P</sup><sup>i</sup> for <sup>i</sup> ∈ {1, <sup>2</sup>}.

$$\begin{array}{rclcrcl} \lbrack x \rbrack\_{\mathsf{open}} &=& \vdash\_L x & & \llbracket N\_1 N\_2 \rbrack\_{\mathsf{open}} &=& \theta\_2 \cdot \theta\_1 \vdash\_L P\_1 P\_2 \\ \lbrack \lambda x.N\_1 \rbrack\_{\mathsf{open}} &=& \theta\_1 \vdash\_L \lambda x.P\_1 & & \llbracket N\_1 \oplus\_{\mathsf{v}} N\_2 \rbrack\_{\mathsf{open}} &=& \theta\_2 \cdot \theta\_1 \cdot a \vdash\_L P\_1 \xrightarrow{a} P\_2 \end{array}$$

The label closure <sup>θ</sup> <sup>L</sup> <sup>P</sup> is given inductively as follows.

$$\left\lfloor \vdash\_L P \right\rfloor = P \qquad \left\lfloor a \cdot \theta \vdash\_L P \right\rfloor = \left\lfloor \theta \vdash\_L \boxed{a}.P \right\rfloor.$$

The call-by-value interpretation of N is -<sup>N</sup><sup>v</sup> <sup>=</sup> -Nopen.

Our call-by-value reduction may choose an arbitrary order in which to evaluate the choices <sup>⊕</sup><sup>v</sup> in a term <sup>N</sup>, but the order of generators in the interpretation -<sup>N</sup><sup>v</sup> is necessarily fixed. Then to simulate a call-by-value reduction, we cannot choose a fixed context stream a priori; all we can say is that for every reduction, there is some stream that allows us to simulate it. Specifically, a reduction step <sup>C</sup>[N<sup>0</sup> <sup>⊕</sup><sup>v</sup> <sup>N</sup><sup>1</sup>] <sup>v</sup> <sup>C</sup>[N<sup>j</sup> ] where <sup>C</sup>[ ] is a call-by-value term context is simulated by the following projective step.

$$\dots \dots \square^i . \square^j . \square^k . \square^l . \dots . D[P\_0 \oplus P\_1] \rightharpoonup \dots \rightharpoonup \dots \square^i . \square^k . \square^k . \dots . D[P\_j] \rightharpoonup$$

Here, -<sup>C</sup>[N<sup>0</sup> <sup>⊕</sup><sup>v</sup> <sup>N</sup>1]open <sup>=</sup> <sup>θ</sup> <sup>L</sup> <sup>D</sup>[P<sup>0</sup> b <sup>⊕</sup>P1] with D[] a ΛPE-context, and θ giving rise to the sequence of generators ... <sup>a</sup> . <sup>b</sup> . <sup>c</sup> ... in the call-by-value translation. To simulate the reduction step, if b occupies the n-th position in θ, then the n-th position in the context stream s must be the element j. Since β-reduction survives the translation and labeling process intact, we may simulate call-byvalue probabilistic reduction by projective and β-reduction.

**Theorem 38.** If <sup>N</sup> <sup>v</sup>,β<sup>v</sup> <sup>V</sup> then -Ns <sup>v</sup> π,β -<sup>V</sup> <sup>v</sup> for some stream <sup>s</sup> <sup>∈</sup> <sup>S</sup>.

## **8 Conclusions and Future Work**

We believe our decomposition of probabilistic choice in λ-calculus to be an elegant and compelling way of restoring confluence, one of the core properties of the λ-calculus. Our probabilistic event λ-calculus captures traditional call-by-name and call-by-value probabilistic reduction, and offers finer control beyond those strategies. Permutative reduction implements a natural and fine-grained equivalence on probabilistic terms as internal rewriting, while projective reduction provides a complementary and more traditional external perspective.

There are a few immediate areas for future work. Firstly, within probabilistic λ-calculus, it is worth exploring if our decomposition opens up new avenues in semantics. Secondly, our approach might apply to probabilistic reasoning more widely, outside the λ-calculus. Most importantly, we will explore if our approach can be extended to other computational effects. Our use of streams interprets probabilistic choice as a read operation from an external source, which means other read operations can be treated similarly. A complementary treatment of write operations would allow us to express a considerable range of effects, including input/output and state.

## **Acknowledgments**

This work was supported by EPSRC Project EP/R029121/1 Typed Lambda-Calculi with Sharing and Unsharing. The first author is partially supported by the ANR project 19CE480014 PPS, the ERC Consolidator Grant 818616 DI-APASoN, and the MIUR PRIN 201784YSZ5 ASPRA. We thank the referees for their diligence and their helpful comments. We are grateful to Chris Barrett and—indirectly—Anupam Das for pointing us to Zantema and Van de Pol's work [27].

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **On the k-synchronizability of Systems**

Cinzia Di Giusto (-) , Laetitia Laversa , and Etienne Lozes

Universit´e Cˆote d'Azur, CNRS, I3S, Sophia Antipolis, France {cinzia.di-giusto,laetitia.laversa,etienne.lozes}@univ-cotedazur.fr

**Abstract.** We study k-synchronizability: a system is k-synchronizable if any of its executions, up to reordering causally independent actions, can be divided into a succession of k-bounded interaction phases. We show two results (both for mailbox and peer-to-peer automata): first, the reachability problem is decidable for k-synchronizable systems; second, the membership problem (whether a given system is k-synchronizable) is decidable as well. Our proofs fix several important issues in previous attempts to prove these two results for mailbox automata.

**Keywords:** Verification · Communicating Automata · A/Synchronous communication.

## **1 Introduction**

Asynchronous message-passing is ubiquitous in communication-centric systems; these include high-performance computing, distributed memory management, event-driven programming, or web services orchestration. One of the parameters that play an important role in these systems is whether the number of pending sent messages can be bounded in a predictable fashion, or whether the buffering capacity offered by the communication layer should be unlimited. Clearly, when considering implementation, testing, or verification, bounded asynchrony is preferred over unbounded asynchrony. Indeed, for bounded systems, reachability analysis and invariants inference can be solved by regular model-checking [5]. Unfortunately and even if designing a new system in this setting is easier, this is not the case when considering that the buffering capacity is unbounded, or that the bound is not known a priori . Thus, a question that arises naturally is how can we bound the "behaviour" of a system so that it operates as one with unbounded buffers? In a recent work [4], Bouajjani *et al.* introduced the notion of k-synchronizable system of finite state machines communicating through mailboxes and showed that the reachability problem is decidable for such systems. Intuitively, a system is k-synchronizable if any of its executions, up to reordering causally independent actions, can be chopped into a succession of k-bounded interaction phases. Each of these phases starts with at most k send actions that are followed by at most k receptions. Notice that, a system may be k-synchronizable even if some of its executions require buffers of unbounded capacity.

As explained in the present paper, this result, although valid, is surprisingly non-trivial, mostly due to complications introduced by the mailbox semantics of communications. Some of these complications were missed by Bouajjani *et al.* and the algorithm for the reachability problem in [4] suffers from false positives. Another problem is the membership problem for the subclass of k-synchronizable systems: for a given k and a given system of communicating finite state machines, is this system k-synchronizable? The main result in [4] is that this problem is decidable. However, again, the proof of this result contains an important flaw at the very first step that breaks all subsequent developments; as a consequence, the algorithm given in [4] produces both false positives and false negatives.

In this work, we present a new proof of the decidability of the reachability problem together with a new proof of the decidability of the membership problem. Quite surprisingly, the reachability problem is more demanding in terms of causality analysis, whereas the membership problem, although rather intricate, builds on a simpler dependency analysis. We also extend both decidability results to the case of peer-to-peer communication.

**Outline.** Next section recalls the definition of communicating systems and related notions. In Section 3 we introduce k-synchronizability and we give a graphical characterisation of this property. This characterisation corrects Theorem 1 in [4] and highlights the flaw in the proof of the membership problem. Next, in Section 4, we establish the decidability of the reachability problem, which is the core of our contribution and departs considerably from [4]. In Section 5, we show the decidability of the membership problem. Section 6 extends previous results to the peer-to-peer setting. Finally Section 7 concludes the paper discussing other related works. Proofs and some additional material are available at https://hal.archives-ouvertes.fr/hal-02272347.

## **2 Preliminaries**

A communicating system is a set of finite state machines that exchange messages: automata have transitions labelled with either send or receive actions. The paper mainly considers as communication architecture, mailboxes: i.e., messages await to be received in FIFO buffers that store all messages sent to a same automaton, regardless of their senders. Section 6, instead, treats peer-to-peer systems, their introduction is therefore delayed to that point.

Let V be a finite set of messages and P a finite set of processes. A send action, denoted send(p, q, **v**), designates the sending of message **v** from process p to process q. Similarly a receive action rec(p, q, **v**) expresses that process q is receiving message **v** from p. We write a to denote a send or receive action. Let <sup>S</sup> <sup>=</sup> {send(p, q, **<sup>v</sup>**) <sup>|</sup> p, q <sup>∈</sup> <sup>P</sup>, **<sup>v</sup>** <sup>∈</sup> <sup>V</sup>} be the set of send actions and <sup>R</sup> <sup>=</sup> {rec(p, q, **<sup>v</sup>**) <sup>|</sup> p, q <sup>∈</sup> <sup>P</sup>, **<sup>v</sup>** <sup>∈</sup> <sup>V</sup>} the set of receive actions. <sup>S</sup><sup>p</sup> and <sup>R</sup><sup>p</sup> stand for the set of sends and receives of process p respectively. Each process is encoded by an automaton and by abuse of notation we say that a *system* is the parallel composition of processes.

**Definition 1 (System).** *A system is a tuple* S = - (Lp, δp, l<sup>0</sup> <sup>p</sup>) <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>P</sup> *where, for each process* <sup>p</sup>*,* <sup>L</sup><sup>p</sup> *is a finite set of local control states,* <sup>δ</sup><sup>p</sup> <sup>⊆</sup> (Lp×(Sp∪Rp)<sup>×</sup> Lp) *is the transition relation (also denoted* l <sup>a</sup> −→<sup>p</sup> <sup>l</sup> - *) and* l 0 <sup>p</sup> *is the initial state.*

**Definition 2 (Configuration).** *Let* S = - (Lp, δp, l<sup>0</sup> <sup>p</sup>) <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>P</sup> *, a configuration is a pair* (l, Buf) *where* l = (lp)p∈<sup>P</sup> <sup>∈</sup> <sup>Π</sup>p∈PL<sup>p</sup> *is a global control state of* <sup>S</sup> *(a local control state for each automaton), and* Buf = (bp)p∈<sup>P</sup> <sup>∈</sup> (V∗)<sup>P</sup> *is a vector of buffers, each* b<sup>p</sup> *being a word over* V*.*

We write l<sup>0</sup> to denote the vector of initial states of all processes <sup>p</sup> <sup>∈</sup> <sup>P</sup>, and Buf<sup>0</sup> stands for the vector of empty buffers. The semantics of a system is defined by the two rules below.

$$\begin{array}{c} \text{[SEND]} \quad \text{[SEND]} \\ \hline l\_p \xrightarrow{send(p,q,\mathbf{v})} \, l'\_p \quad b'\_q = b\_q \cdot \mathbf{v} \\ \hline (\overrightarrow{l}, \textbf{Buf}) \xrightarrow{send(p,q,\mathbf{v})} \, (\overrightarrow{l}[l'\_p/l\_p], \textbf{Buf}[b'\_q/b\_q]) \end{array} \xrightarrow{rec(p,q,\mathbf{v})} \begin{array}{c} \text{l} \\ \hline l\_q \xrightarrow{rec(p,q,\mathbf{v})} \, \mathbf{l}'\_q \quad b\_q = \mathbf{v} \cdot \mathbf{b}'\_q \end{array} \xrightarrow{\text{s} \, \mathbf{f} \, \mathbf{f}} \begin{array}{c} \text{(\"h"{b}\",b\"{b}\"{\!=1})} \xrightarrow{\mathbf{b}} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{f} \, \mathbf{b} \, \mathbf{$$

A send action adds a message in the buffer b of the receiver, and a receive action pops the message from this buffer. An execution <sup>e</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ··· <sup>a</sup><sup>n</sup> is a sequence of actions in <sup>S</sup> <sup>∪</sup> <sup>R</sup> such that (l0, Buf0) <sup>a</sup><sup>1</sup> −→ ··· <sup>a</sup><sup>n</sup> −−→ (l, Buf) for some l and Buf. As usual <sup>e</sup> <sup>=</sup><sup>⇒</sup> stands for <sup>a</sup><sup>1</sup> −→ ··· <sup>a</sup><sup>n</sup> −−→. We write asEx(S) to denote the set of asynchronous executions of a system <sup>S</sup>. In a sequence of actions <sup>e</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ··· <sup>a</sup>n, a send action a<sup>i</sup> = send(p, q, **v**) is *matched* by a reception a<sup>j</sup> = rec(p- , q- , **v**- ) (denoted by <sup>a</sup><sup>i</sup> <sup>a</sup><sup>j</sup> ) if i<j, <sup>p</sup> <sup>=</sup> <sup>p</sup>- , q = q- , **v** = **v**- , and there is <sup>≥</sup> 1 such that a<sup>i</sup> and a<sup>j</sup> are the th actions of e with these properties respectively. A send action a<sup>i</sup> is *unmatched* if there is no matching reception in e. A *message exchange* of a sequence of actions <sup>e</sup> is a set either of the form <sup>v</sup> <sup>=</sup> {ai, aj} with <sup>a</sup><sup>i</sup> <sup>a</sup><sup>j</sup> or of the form <sup>v</sup> <sup>=</sup> {ai} with <sup>a</sup><sup>i</sup> unmatched. For a message **<sup>v</sup>**i, we will note <sup>v</sup><sup>i</sup> the corresponding message exchange. When v is either an unmatched send(p, q, **v**) or a pair of matched actions {send(p, q, **<sup>v</sup>**), rec(p, q, **<sup>v</sup>**)}, we write procS(v) for <sup>p</sup> and procR(v) for q. Note that procR(v) is defined even if v is unmatched. Finally, we write procs(v) for {p} in the case of an unmatched send and {p, q} in the case of a matched send.

An execution imposes a total order on the actions. We are interested in stressing the causal dependencies between messages. We thus make use of message sequence charts (MSCs) that only impose an order between matched pairs of actions and between the actions of a same process. Informally, an MSC will be depicted with vertical timelines (one for each process) where time goes from top to bottom, that carry some events (points) representing send and receive actions of this process (see Fig. 1). An arc is drawn between two matched events. We will also draw a dashed arc to depict an unmatched send event. An MSC is, thus, a partially ordered set of events, each corresponding to a send or receive action.

**Definition 3 (MSC).** *A message sequence chart is a tuple* (Ev, λ, <sup>≺</sup>)*, where*

	- ≺po *is a partial order on* Ev *such that, for all process* <sup>p</sup>*,* <sup>≺</sup>po *induces a total order on the set of events of process* <sup>p</sup>*, i.e., on* <sup>λ</sup>−<sup>1</sup>(S<sup>p</sup> <sup>∪</sup> <sup>R</sup>p)

Fig. 1: (a) and (b): two MSCs that violate causal delivery. (c) and (d): an MSC and its conflict graph

	- <sup>∗</sup> *for all events* <sup>r</sup> <sup>∈</sup> <sup>λ</sup>−1(R)*, there is exactly one events* <sup>s</sup> *such that* <sup>s</sup> <sup>≺</sup>src <sup>r</sup>
	- <sup>∗</sup> *for all events* <sup>s</sup> <sup>∈</sup> <sup>λ</sup>−1(S)*, there is at most one event* <sup>r</sup> *such that* <sup>s</sup> <sup>≺</sup>src <sup>r</sup>
	- <sup>∗</sup> *for any two events* s, r *such that* <sup>s</sup> <sup>≺</sup>src <sup>r</sup>*, there are* p, q, **<sup>v</sup>** *such that* λ(s) = send(p, q, **v**) *and* λ(r) = rec(p, q, **v**)*.*

We identify MSCs up to graph isomorphism (i.e., we view an MSC as a labeled graph). For a given *well-formed* (i.e., each reception is matched) sequence of actions <sup>e</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ...an, we let msc(e) be the MSC where Ev = [1..n], <sup>≺</sup>po is the set of pairs of indices (i, j) such that i<j and {ai, aj} ⊆ <sup>S</sup><sup>p</sup> <sup>∪</sup> <sup>R</sup><sup>p</sup> for some <sup>p</sup> <sup>∈</sup> <sup>P</sup> (i.e., <sup>a</sup><sup>i</sup> and <sup>a</sup><sup>j</sup> are actions of a same process), and <sup>≺</sup>src is the set of pairs of indices (i, j) such that <sup>a</sup><sup>i</sup> <sup>a</sup><sup>j</sup> . We say that <sup>e</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ...a<sup>n</sup> is a *linearisation* of msc(e), and we write asT r(S) to denote {msc(e) <sup>|</sup> <sup>e</sup> <sup>∈</sup> asEx(S)} the set of MSCs of system S.

Mailbox communication imposes a number of constraints on what and when messages can be read. The precise definition is given below, we now discuss some of the possible scenarios. For instance: if two messages are sent to a same process, they will be received in the same order as they have been sent. As another example, unmatched messages also impose some constraints: if a process p sends an unmatched message to r, it will not be able to send matched messages to r afterwards (Fig. 1a); or similarly, if a process p sends an unmatched message to r, any process q that receives subsequent messages from p will not be able to send matched messages to r afterwards (Fig. 1b). When an MSC satisfies the constraint imposed by mailbox communication, we say that it satisfies causal delivery. Notice that, by construction, all executions satisfy causal delivery.

**Definition 4 (Causal Delivery).** *Let* (Ev, λ, <sup>≺</sup>) *be an MSC. We say that it satisfies causal delivery if the MSC has a linearisation* e = a<sup>1</sup> ...a<sup>n</sup> *such that for any two events* <sup>i</sup> <sup>≺</sup> <sup>j</sup> *such that* <sup>a</sup><sup>i</sup> <sup>=</sup> send(p, q, **<sup>v</sup>**) *and* <sup>a</sup><sup>j</sup> <sup>=</sup> send(p- , q, **v**- )*, either* a<sup>j</sup> *is unmatched, or there are* i - , j *such that* <sup>a</sup><sup>i</sup> <sup>a</sup><sup>i</sup>- *,* <sup>a</sup><sup>j</sup> <sup>a</sup><sup>j</sup>- *, and* i - <sup>≺</sup> <sup>j</sup>- *.*

Our definition enforces the following intuitive property.

**Proposition 1.** *An MSC* msc *satisfies causal delivery if and only if there is a system* <sup>S</sup> *and an execution* <sup>e</sup> <sup>∈</sup> asEx(S) *such that* msc <sup>=</sup> msc(e)*.*

We now recall from [4] the definition of *conflict graph* depicting the causal dependencies between message exchanges. Intuitively, we have a dependency whenever two messages have a process in common. For instance an SS −→ dependency between message exchanges v and v expresses the fact that v has been sent after v, by the same process.

**Definition 5 (Conflict Graph).** *The conflict graph* CG(e) *of a sequence of actions* <sup>e</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ··· <sup>a</sup><sup>n</sup> *is the labeled graph* (V, { XY −→}X,Y ∈{R,S}) *where* <sup>V</sup> *is the set of message exchanges of* <sup>e</sup>*, and for all* X, Y ∈ {S, R}*, for all* v, v- <sup>∈</sup> <sup>V</sup> *, there is a* XY *dependency edge* v XY −→ <sup>v</sup> *between* v *and* v *if there are* i<j *such that* {ai} <sup>=</sup> <sup>v</sup> <sup>∩</sup> <sup>X</sup>*,* {aj} <sup>=</sup> <sup>v</sup>- <sup>∩</sup> <sup>Y</sup> *, and* procX(v) = proc<sup>Y</sup> (v- )*.*

Notice that each linearisation e of an MSC will have the same conflict graph. We can thus talk about an MSC and the associated conflict graph. (As an example see Figs. 1c and 1d.)

We write <sup>v</sup> <sup>→</sup> <sup>v</sup> if v XY −→ <sup>v</sup> for some X, Y ∈ {R, S}, and <sup>v</sup> <sup>→</sup><sup>∗</sup> <sup>v</sup> if there is a (possibly empty) path from v to v- .

## **3** *k***-synchronizable Systems**

In this section, we define k-synchronizable systems. The main contribution of this part is a new characterisation of k-synchronizable executions that corrects the one given in [4].

In the rest of the paper, <sup>k</sup> denotes a given integer <sup>k</sup> <sup>≥</sup> 1. A <sup>k</sup>-exchange denotes a sequence of actions starting with at most k sends and followed by at most k receives matching some of the sends. An MSC is k*-synchronous* if there exists a linearisation that is breakable into a sequence of k*-exchanges*, such that a message sent during a k-exchange cannot be received during a subsequent one: either it is received during the same k-exchange, or it remains orphan forever.

## **Definition 6 (**k**-synchronous).** *An MSC* msc *is* k*-synchronous if:*


*An execution* e *is* k*-synchronizable if* msc(e) *is* k*-synchronous.*

We write sT rk(S) to denote the set {msc(e) <sup>|</sup> <sup>e</sup> <sup>∈</sup> asEx(S) and msc(e) is <sup>k</sup>-synchronous}.

*Example 1 (*k*-synchronous MSCs and* k*-synchronizable Executions).*

Fig. 2: (a) the MSC of Example 1.1. (b) the MSC of Example 1.2. (c) the MSC of Example 2 and (d) its conflict graph.


*Comparison with [4]*. In [4], the authors define set sExk(S) as the set of ksynchronous executions of system S in the k-synchronous semantics. Nonetheless as remarked in Example 1.2 not all executions of a system can be divided into k-exchanges even if they are k-synchronizable. Thus, in order not to lose any executions, we have decided to reason only on MSCs (called traces in [4]).

Following standard terminology, we say that a set <sup>U</sup> <sup>⊆</sup> <sup>V</sup> of vertices is a *strongly connected component* (SCC) of a given graph (V, <sup>→</sup>) if between any two vertices v, v- <sup>∈</sup> <sup>U</sup>, there exist two oriented paths <sup>v</sup> <sup>→</sup><sup>∗</sup> <sup>v</sup> and v- <sup>→</sup><sup>∗</sup> <sup>v</sup>. The statement below fixes some issues with Theorem 1 in [4].

**Theorem 1 (Graph Characterisation of** k**-synchronous MSCs).** *Let* msc *be a causal delivery MSC.* msc *is* k*-synchronous iff every SCC in its conflict graph is of size at most* k *and if no RS edge occurs on any cyclic path.*

*Example 2 (A* 5*-synchronous MSC).* Fig. 2c depicts a 5-synchronous MSC, that is not 4-synchronous. Indeed, its conflict graph (Fig. 2d) contains a SCC of size 5 (all vertices are on the same SCC).

*Comparison with [4]*. Bouajjani *et al.* give a characterisation of k-synchronous executions similar to ours, but they use the word *cycle* instead of SCC, and the subsequent developments of the paper suggest that they intended to say *Hamiltonian cycle* (i.e., a cyclic path that does not go twice through the same vertex). It is not the case that a MSC is k-synchronous if and only if every Hamiltonian cycle in its conflict graph is of size at most k and if no RS edge occurs on any cyclic path. Indeed, consider again Example 2. This graph is not Hamiltonian, and the largest Hamiltonian cycle indeed is of size 4 only. But as we already discussed in Example 2, the corresponding MSC is not 4-synchronous.

As a consequence, the algorithm that is presented in [4] for deciding whether a system is k-synchronizable is not correct as well: the MSC of Fig. 2c would be considered 4-synchronous according to this algorithm, but it is not.

## **4 Decidability of Reachability for** *k***-synchronizable Systems**

We show that the reachability problem is decidable for k-synchronizable systems. While proving this result, we have to face several non-trivial aspects of causal delivery that were missed in [4] and that require a completely new approach.

**Definition 7 (**k**-synchronizable System).** *A system* S *is* k-synchronizable *if all its executions are* k*-synchronizable, i.e.,* sT rk(S) = asT r(S)*.*

In other words, a system S is k-synchronizable if for every execution e of S, msc(e) may be divided into k-exchanges.

*Remark 1.* In particular, a system may be k-synchronizable even if some of its executions fill the buffers with more than k messages. For instance, the only linearisation of the 1-synchronous MSC Fig. 2b that is an execution of the system needs buffers of size 2.

For a k-synchronizable system, the reachability problem reduces to the reachability through a k-synchronizable execution. To show that k-synchronous reachability is decidable, we establish that the set of k-synchronous MSCs is regular. More precisely, we want to define a finite state automaton that accepts a sequence <sup>e</sup><sup>1</sup> · <sup>e</sup><sup>2</sup> ··· <sup>e</sup><sup>n</sup> of <sup>k</sup>-exchanges if and only if they satisfy causal delivery.

We start by giving a graph-theoretic characterisation of causal delivery. For this, we define the *extended edges* v XY - v of a given conflict graph. The relation XY - is defined in Fig. <sup>3</sup> with X, Y ∈ {S, R}. Intuitively, <sup>v</sup> XY - v expresses that event X of v must happen before event Y of v due to either their order on the same machine (Rule 1), or the fact that a send happens before its matching receive (Rule 2), or due to the mailbox semantics (Rules 3 and 4), or because of a chain of such dependencies (Rule 5). We observe that in the *extended conflict graph*, obtained applying such rules, a cyclic dependency appears whenever causal delivery is not satisfied.

$$\begin{array}{ll} \text{(Rule 1)} \frac{v\_1 \stackrel{XY}{\longrightarrow} v\_2}{v\_1 \stackrel{XY}{\longrightarrow} v\_2} & \text{(Rule 2)} \frac{v \cap R \neq \emptyset}{v \stackrel{SR}{\longrightarrow} v} & \text{(Rule 3)} \frac{v\_1 \stackrel{RR}{\longrightarrow} v\_2}{v\_1 \stackrel{SS}{\longrightarrow} v\_2} \\\\ \text{(Rule 4)} \frac{\begin{array}{l} v\_1 \cap R \neq \emptyset \qquad v\_2 \cap R = \emptyset \\ \mathbf{procc\_R(v\_1) = \mathbf{procc\_R(v\_2)}} \end{array} & \text{(Rule 5)} \frac{\begin{array}{l} \frac{XYZ}{\longrightarrow} v\_2 \stackrel{XY}{\longrightarrow} v\_2 \end{array}}{\begin{array}{l} \frac{XYZ}{\longrightarrow} v\_1 \stackrel{XY}{\longrightarrow} v\_2 \end{array}} \end{array}$$

Fig. 3: Deduction rules for extended dependency edges of the conflict graph

*Example 3.* Fig. 5a and 5b depict an MSC and its associated conflict graph with some extended edges. This MSC violates causal delivery and there is a cyclic dependency v<sup>1</sup> SS -v1.

**Theorem 2 (Graph-theoretic Characterisation of Causal Delivery).** *An MSC satisfies causal delivery iff there is no cyclic causal dependency of the form* v SS -v *for some vertex* v *of its extended conflict graph.*

Let us now come back to our initial problem: we want to recognise with finite memory the sequences e1, e<sup>2</sup> ...e<sup>n</sup> of k-exchanges that composed give an MSC that satisfies causal delivery. We proceed by reading each k-exchange one by one in sequence. This entails that, at each step, we have only a partial view of the global conflict graph. Still, we want to determine whether the acyclicity condition of Theorem 2 is satisfied in the global conflict graph. The crucial observation is that only the edges generated by Rule 4 may "go back in time". This means that we have to remember enough information from the previously examined kexchanges to determine whether the current k-exchange contains a vertex v that shares an edge with some unmatched vertex v seen in a previous k-exchange and whether this could participate in a cycle. This is achieved by computing two sets of processes CS,p and CR,p that collect the following information: a process q is in CS,p if it performs a send action causally after an unmatched send to p, or it is the sender of the unmatched send; a process q belongs to CR,p if it receives a message that was sent after some unmatched message directed to p. More precisely, we have:

$$\begin{aligned} C\_{S,p} &= \{ \mathsf{proc}\_S(v) \mid v' \xrightarrow{SS} v \text{ \&\ } v' \text{ is unmatched \&\ } \mathsf{proc}\_R(v') = p \} \\ C\_{R,p} &= \{ \mathsf{proc}\_R(v) \mid v' \xrightarrow{SS} v \text{ \&\ } v' \text{ is unmatched \&\ } \mathsf{proc}\_R(v') = p \text{ \&\ } v \cap R \neq \emptyset \} \end{aligned}$$

These sets abstract and carry from one k-exchange to another the necessary information to detect violations of causal delivery. We compute them in any local conflict graph of a k-exchange incrementally, i.e., knowing what they were at the end of the previous k-exchange, we compute them at the end of the current one. More precisely, let <sup>e</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>m</sup> · <sup>r</sup><sup>1</sup> ··· <sup>r</sup><sup>m</sup> be a k-exchange, CG(e)=(V,E) its conflict graph and <sup>B</sup> : <sup>P</sup> <sup>→</sup> (2<sup>P</sup> <sup>×</sup> <sup>2</sup><sup>P</sup>) the function that associates to each <sup>p</sup> <sup>∈</sup> <sup>P</sup> the two sets B(p)=(CS,p, CR,p). Then, the conflict graph CG(e, B) is the graph (V - , E- ) with V - <sup>=</sup> <sup>V</sup> ∪ {ψ<sup>p</sup> <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>P</sup>} and <sup>E</sup>- <sup>⊇</sup> <sup>E</sup> as defined below. For each process <sup>p</sup> <sup>∈</sup> <sup>P</sup>, the "summary node" <sup>ψ</sup><sup>p</sup> shall account for all past unmatched

$$\begin{aligned} e = s\_1 \cdots s\_m \cdot r\_1 \cdots r\_{m'} \quad s\_1 \cdots s\_m \in S^\* \quad r\_1 \cdots r\_{m'} \in R^\* \quad 0 \le m' \le m \le k\\ (\vec{l}, \mathsf{Bur}\_0) \stackrel{e}{\to} (\vec{l'}, \mathsf{Bur}\_1') \text{ for some } \mathsf{Bur} \\ \text{for all } p \in \mathbb{P} \quad B(p) = (C\_{S,p}, C\_{R,p}) \text{ and } B'(p) = (C'\_{S,p}, C'\_{R,p}),\\ \mathsf{Umm}\_p = \{\psi\_p\} \cup \{v \mid v \text{ is unentangled, } \mathsf{prac}\_R(v) = p\} \\ C'\_{X,p} = C\_{X,p} \cup \{p \mid p \in C\_{X,q}, v \stackrel{S S}{\rightharpoonup} \psi\_q, (\mathsf{prac}\_R(v) = p \text{ or } v = \psi\_p)\} \cup \\ \{\mathsf{prec}\_X(v) \mid v \in \mathsf{Um}\_p \cap V, X = S\} \cup \{\mathsf{prec}\_X(v') \mid v \stackrel{S S}{\rightharpoonup} \psi', v \in \mathsf{Um}\_p, v \cap X \neq \emptyset\} \\ \text{for all } p \in \mathbb{P}, p \notin C'\_{R,p} \\ \underline{(\vec{l}, B) \stackrel{e.k.}{\underset{c.k}{\geq}}{\longrightarrow} (\vec{l}', B')} \end{aligned}$$

Fig. 4: Definition of the relation e,k ==⇒ cd

messages sent to p that occurred in some k-exchange before e. E is the set E of edges XY −→ among message exchanges of <sup>e</sup>, as in Definition 5, augmented with the following set of extra edges that takes into account summary nodes.

$$\{\psi\_p \stackrel{SX}{\longrightarrow} v \mid \mathsf{proc}\_X(v) \in C\_{S,p} \: \& \ v \cap X \neq \emptyset \text{ for some } X \in \{S, R\}\}\tag{1}$$

$$\cup \{ \psi\_p \stackrel{SS}{\longrightarrow} v \mid \mathbf{proc}\_X(v) \in C\_{R,p} \& \ v \cap R \neq \emptyset \text{ for some } X \in \{S, R\} \}\tag{2}$$

$$\{v \cup \{\psi\_p \stackrel{SS}{\longrightarrow} v \mid \mathsf{procc}\_R(v) \in C\_{R,p} \land v \text{ is unmatched}\}\}\tag{3}$$

$$0 \cup \{ v \stackrel{SS}{\longrightarrow} \psi\_p \mid \mathsf{proc}\_R(v) = p \nbrron \, v \cap R \neq \emptyset \} \, \cup \, \{ \psi\_q \stackrel{SS}{\longrightarrow} \psi\_p \mid p \in C\_{R,q} \} \tag{4}$$

These extra edges summarise/abstract the connections to and from previous k-exchanges. Equation (1) considers connections SS −→ and SR −→ that are due to two sends messages or, respectively, a send and a receive on the same process. Equations (2) and (3) considers connections RR −→ and RS −→ that are due to two received messages or, respectively, a receive and a subsequent send on the same process. Notice how the rules in Fig. 3 would then imply the existence of a connection SS --, in particular Equation (3) abstract the existence of an edge SS -- built because of Rule 4. Equations in (4) abstract edges that would connect the current k-exchange to previous ones. As before those edges in the global conflict graph would correspond to extended edges added because of Rule 4 in Fig. 3. Once we have this enriched local view of the conflict graph, we take its extended version. Let XY - denote the edges of the extended conflict graph as defined from rules in Fig. 3 taking into account the new vertices ψ<sup>p</sup> and their edges.

Finally, let S be a system and e,k ==⇒ cd be the transition relation given in Fig. <sup>4</sup> among abstract configurations of the form (l, B). l is a global control state of <sup>S</sup> and <sup>B</sup> : <sup>P</sup> <sup>→</sup> - <sup>2</sup><sup>P</sup> <sup>×</sup> <sup>2</sup><sup>P</sup> is the function defined above that associates to each process <sup>p</sup> a pair of sets of processes <sup>B</sup>(p)=(CS,p, CR,p). Transition e,k ==⇒ cd updates these sets with respect to the current k-exchange e. Causal delivery is verified by checking that for all <sup>p</sup> <sup>∈</sup> <sup>P</sup>, p <sup>∈</sup> <sup>C</sup>- R,p meaning that there is no cyclic dependency

Fig. 5: (a) an MSC (b) its associated global conflict graph, (c) the conflict graphs of its k-exchanges

as stated in Theorem 2. The initial state is (l0, B0), where <sup>B</sup><sup>0</sup> : <sup>P</sup> <sup>→</sup> (2<sup>P</sup> <sup>×</sup> <sup>2</sup><sup>P</sup>) denotes the function such that <sup>B</sup>0(p)=(∅, <sup>∅</sup>) for all <sup>p</sup> <sup>∈</sup> <sup>P</sup>.

*Example 4 (An Invalid Execution).* Let <sup>e</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> · <sup>e</sup><sup>2</sup> with <sup>e</sup><sup>1</sup> and <sup>e</sup><sup>2</sup> the two 2-exchanges of this execution. such that <sup>e</sup><sup>1</sup> <sup>=</sup> send(q, r, **<sup>v</sup>**1) · send(q, s, **<sup>v</sup>**2) · rec(q, s, **<sup>v</sup>**2) and <sup>e</sup><sup>2</sup> <sup>=</sup> send(p, s, **<sup>v</sup>**3) · rec(p, s, **<sup>v</sup>**3) · send(p, r, **<sup>v</sup>**4) · rec(p, r, **<sup>v</sup>**4). Fig. 5a and 5c show the MSC and corresponding conflict graph of each of the 2-exchanges. Note that two edges of the global graph (in blue) "go across" kexchanges. These edges do not belong to the local conflict graphs and are mimicked by the incoming and outgoing edges of summary nodes. The values of sets CS,r and CR,r at the beginning and at the end of the k-exchange are given on the right. All other sets <sup>C</sup>S,p and <sup>C</sup>R,p for <sup>p</sup> <sup>=</sup> <sup>r</sup> are empty, since there is only one unmatched message to process r. Notice how at the end of the second <sup>k</sup>-exchange, <sup>r</sup> <sup>∈</sup> <sup>C</sup>- R,r signalling that message <sup>v</sup><sup>4</sup> violates causal delivery.

*Comparison with [4]*. In [4] the authors define e,k ==⇒ cd in a rather different way: they do not explicitly give a graph-theoretic characterisation of causal delivery; instead they compute, for every process p, the set B(p) of processes that either sent an unmatched message to p or received a message from a process in B(p). They then make sure that any message sent to <sup>p</sup> by a process <sup>q</sup> <sup>∈</sup> <sup>B</sup>(p) is unmatched. According to that definition, the MSC of Fig. 5b would satisfy causal delivery and would be 1-synchronous. However, this is not the case (this MSC does not satisfy causal delivery) as we have shown in Example 3. Due to to the above errors, we had to propose a considerably different approach. The extended edges of the conflict graph, and the graph-theoretic characterisation of causal delivery as well as summary nodes, have no equivalent in [4].

Next lemma proves that Fig. 4 properly characterises causal delivery.

**Lemma 1.** *An MSC* msc *is* <sup>k</sup>*-synchronous iff there is* <sup>e</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> ··· <sup>e</sup><sup>n</sup> *a linearisation such that* (l0, B0) <sup>e</sup>1,k ==<sup>⇒</sup> cd ··· <sup>e</sup>n,k ==<sup>⇒</sup> cd (l- , B- ) *for some global state* l *and some* B-: <sup>P</sup> <sup>→</sup> (2<sup>P</sup> <sup>×</sup> <sup>2</sup>P)*.*

Note that there are only finitely many abstract configurations of the form (l, B) with l a tuple of control states and <sup>B</sup> : <sup>P</sup> <sup>→</sup> (2<sup>P</sup> <sup>×</sup> <sup>2</sup>P). Moreover, since <sup>V</sup> is finite, the alphabet over the possible k-exchange for a given k is also finite. Therefore e,k ==⇒ cd is a relation on a finite set, and the set sT rk(S) of <sup>k</sup>-synchronous MSCs of a system S forms a regular language. It follows that it is decidable whether a given abstract configuration of the form (l, B) is reachable from the initial configuration following a k-synchronizable execution.

**Theorem 3.** *Let* <sup>S</sup> *be a* <sup>k</sup>*-synchronizable system and* l *a global control state of* <sup>S</sup>*. The problem whether there exists* <sup>e</sup> <sup>∈</sup> asEx(S) *and* Buf *such that* (l0, Buf0) <sup>e</sup> =⇒ (l, Buf) *is decidable.*

*Remark 2.* Deadlock-freedom, unspecified receptions, and absence of orphan messages are other properties that become decidable for a k-synchronizable system because of the regularity of the set of k-synchronous MSCs.

## **5 Decidability of** *k***-synchronizability for Mailbox Systems**

We establish the decidability of k-synchronizability; our approach is similar to the one of [4] based on the notion of borderline violation, but we adjust it to adapt to the new characterisation of k-synchronizable executions (Theorem 1).

**Definition 8 (Borderline Violation).** *A non* k*-synchronizable execution* e *is a borderline violation if* e = e- · <sup>r</sup>*,* <sup>r</sup> *is a reception and* <sup>e</sup>*is* k*-synchronizable.*

Note that a system S that is not k-synchronizable always admits at least one borderline violation e- · <sup>r</sup> <sup>∈</sup> asEx(S) with <sup>r</sup> <sup>∈</sup> <sup>R</sup>: indeed, there is at least one execution <sup>e</sup> <sup>∈</sup> asEx(S) which contains a unique minimal prefix of the form <sup>e</sup>- · r that is not k-synchronizable; moreover since e is k-synchronizable, r cannot be a k-exchange of just one send action, therefore it must be a receive action. In order to find such a borderline violation, Bouajjani *et al.* introduced an instrumented system S that behaves like S, except that it contains an extra process π, and such that a non-deterministically chosen message that should have been sent from a process p to a process q may now be sent from p to π, and later forwarded by π to q. In S- , each process p has the possibility, instead of sending a message **v** to q, to deviate this message to π; if it does so, p continues its execution as if it really had sent it to q. Note also that the message sent to π get tagged with the original destination process q. Similarly, for each possible reception, a process has the possibility to receive a given message not from the initial sender but from π. The process π has an initial state from which it can receive any messages from the system. Each reception makes it go into a different state. From this state, instrumented system S-

it is able to send the message back to the original recipient. Once a message is forwarded, π reaches its final state and remains idle. The following example illustrates how the instrumented system works.

#### *Example 5 (A Deviated Message).*

Let e1, e<sup>2</sup> be two executions of a system S with MSCs respectively msc(e1) and msc(e2). e<sup>1</sup> is not 1 synchronizable. It is borderline in S. If we delete the last reception, it becomes indeed 1-synchronizable. msc(e2) is the MSC obtained from the instrumented system S- where the message **v**<sup>1</sup> is first deviated to π and then sent back to q from π. Note that msc(e2) is 1-synchronous. In this case, the

"reveals" the existence of a borderline violation of <sup>S</sup>. msc(e1) msc(e2)

For each execution <sup>e</sup> · <sup>r</sup> <sup>∈</sup> asEx(S) that ends with a reception, there exists an execution deviate(<sup>e</sup> · <sup>r</sup>) <sup>∈</sup> asEx(S- ) where the message exchange associated with the reception <sup>r</sup> has been deviated to <sup>π</sup>; formally, if <sup>e</sup> · <sup>r</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> · <sup>s</sup> · <sup>e</sup><sup>2</sup> · <sup>r</sup> with <sup>r</sup> <sup>=</sup> rec(p, q, **<sup>v</sup>**) and <sup>s</sup> <sup>r</sup>, then

in the 1-synchronous semantics

deviate(e·r) = <sup>e</sup>1·send(p, π,(q, **<sup>v</sup>**))·rec(p, π,(q, **<sup>v</sup>**))·e2·send(π, q,(**v**))·rec(π, q, **<sup>v</sup>**).

**Definition 9 (Feasible Execution, Bad Execution).** *A* k*-synchronizable execution* e *of* S *is* feasible *if there is an execution* <sup>e</sup> · <sup>r</sup> <sup>∈</sup> asEx(S) *such that* deviate(e·r) = <sup>e</sup>- *. A feasible execution* e- <sup>=</sup> deviate(e·r) *of* <sup>S</sup> *is* bad *if execution* <sup>e</sup> · <sup>r</sup> *is not* <sup>k</sup>*-synchronizable in* <sup>S</sup>*.*

*Example 6 (A Non-feasible Execution).* Let e be an execution such that msc(e- ) is as depicted on the right. Clearly, this MSC satisfies causal delivery and could be the execution of some instrumented system S- . However, the sequence <sup>e</sup>·<sup>r</sup> such that deviate(e·r) = <sup>e</sup>- does not satisfy causal delivery, therefore it cannot be an execution of the original system S. In other words, the execution e is not feasible. msc(e- ) p q π (q, **v**1) **v**2 **v**1 msc(<sup>e</sup> · <sup>r</sup>) p q **v**1 **v**2

**Lemma 2.** *A system* S *is not* k*-synchronizable iff there is a* k*-synchronizable execution* e *of* S*that is feasible and bad.*

As we have already noted, the set of k-synchronous MSCs of S is regular. The decision procedure for k-synchronizability follows from the fact that the set of MSCs that have as linearisation a feasible bad execution as we will see, is regular as well, and that it can be recognised by an (effectively computable) non-deterministic finite state automaton. The decidability of k-synchronizability follows then from Lemma 2 and the decidability of the emptiness problem for non-deterministic finite state automata.

**Recognition of Feasible Executions.** We start with the automaton that recognises feasible executions; for this, we revisit the construction we just used for recognising sequences of k-exchanges that satisfy causal delivery.

In the remainder, we assume an execution e- <sup>∈</sup> asEx(S- ) that contains exactly one send of the form send(p, π,(q, **v**)) and one reception of the form rec(π, q, **v**), this reception being the last action of e- . Let (V, { XY −→}X,Y ∈{R,S}) be the conflict graph of e- . There are two uniquely determined vertices <sup>υ</sup>start, υstop <sup>∈</sup> V such that procR(υstart) = π and procS(υstop) = π that correspond, respectively, to the first and last message exchanges of the deviation. The conflict graph of <sup>e</sup> · <sup>r</sup> is then obtained by merging these two nodes.

**Lemma 3.** *The execution* e *is not feasible iff there is a vertex* v *in the conflict graph of* e *such that* υstart SS - v RR −→ <sup>υ</sup>stop*.*

In order to decide whether an execution e is feasible, we want to forbid that a send action send(p- , q, **v**- ) that happens causally after υstart is matched by a receive rec(p- , q, **v**- ) that happens causally before the reception υstop. As a matter of fact, this boils down to deal with the deviated send action as an unmatched send. So we will consider sets of processes C<sup>π</sup> <sup>S</sup> and <sup>C</sup><sup>π</sup> <sup>R</sup> similar to the ones used for e,k ==⇒ cd , but with the goal of computing which actions happen causally after the send to π. We also introduce a summary node ψstart and the extra edges following the same principles as in the previous section. Formally, let <sup>B</sup> : <sup>P</sup> <sup>→</sup> (2<sup>P</sup> <sup>×</sup> <sup>2</sup><sup>P</sup>), C<sup>π</sup> <sup>S</sup> , C<sup>π</sup> <sup>R</sup> <sup>⊆</sup> <sup>P</sup> and <sup>e</sup> <sup>∈</sup> <sup>S</sup>≤<sup>k</sup>R≤<sup>k</sup> be fixed, and let CG(e, B)=(<sup>V</sup> - , E- ) be the constraint graph with summary nodes for unmatched sent messages as defined in the previous section. The local constraint graph CG(e, B, C<sup>π</sup> <sup>S</sup> , C<sup>π</sup> <sup>R</sup>) is defined as the graph (V --, E--) where V -- = V - ∪ {ψstart} and <sup>E</sup>- is Eaugmented with

$$\{\psi\_{\mathsf{start}} \stackrel{SX}{\longrightarrow} v \mid \mathsf{proc}\_X(v) \in C\_S^\pi \&\; v \cap X \neq \emptyset \text{ for some } X \in \{S, R\}\}$$

$$\cup \{ \psi\_{\mathsf{start}} \stackrel{SS}{\longrightarrow} v \mid \mathsf{procc}\_X(v) \in C\_R^\pi \&\ v \cap R \neq \emptyset \text{ for some } X \in \{S, R\} \}$$

$$\{\psi \upharpoonright \underline{\psi\_{\texttt{start}}} \stackrel{\textbf{SS}}{\longrightarrow} v \mid \mathsf{proc}\_R(v) \in C\_R^\pi \: \& \ v \text{ is unmatched} \} \cup \; \{\psi\_{\texttt{start}} \stackrel{\textbf{SS}}{\longrightarrow} \psi\_p \mid p \in C\_R^\pi \} $$

As before, we consider the "closure" XY - of these edges by the rules of Fig. 3. The transition relation e,k ===⇒ feas is defined in Fig. 6. It relates abstract configurations of the form (l, B, C, destπ) with C = (CS,π, CR,π) and dest<sup>π</sup> <sup>∈</sup> <sup>P</sup>∪{⊥} storing to whom the message deviated to π was supposed to be delivered. Thus, the initial abstract configuration is (l0, B0,(∅, <sup>∅</sup>), <sup>⊥</sup>), where <sup>⊥</sup> means that the processus dest<sup>π</sup> has not been determined yet. It will be set as soon as the send to process π is encountered.

**Lemma 4.** *Let* e *be an execution of* S- *. Then* e *is a* k*-synchronizable feasible execution iff there are* e-- <sup>=</sup> <sup>e</sup><sup>1</sup> ··· <sup>e</sup><sup>n</sup> · send(π, q, **<sup>v</sup>**) · rec(π, q, **<sup>v</sup>**) *with* <sup>e</sup>1,...,e<sup>n</sup> <sup>∈</sup> S≤<sup>k</sup>R≤<sup>k</sup>*,* B- : <sup>P</sup> <sup>→</sup> <sup>2</sup><sup>P</sup>*,* C - <sup>∈</sup> (2<sup>P</sup>)<sup>2</sup>*, and a tuple of control states* l *such that* msc(e- ) = msc(e--)*,* <sup>π</sup> <sup>∈</sup> <sup>C</sup>R,q *(with* <sup>B</sup>- (q)=(CS,q, CR,q)*), and*

$$(\overrightarrow{l\_0}, B\_0, (\emptyset, \emptyset), \bot) \xrightarrow[\text{feas}]{e\_1, k\_{\text{>}}} \dots \xrightarrow[\text{feas}]{e\_n, k\_{\text{>}}} (\overrightarrow{l'}, B', \overrightarrow{C'}, q).$$

$$\begin{aligned} (\vec{l},B) \xrightarrow[\text{cst}]{\frac{c,k}{\text{ed}}} (\vec{l}',B') \quad & e = a\_1 \cdots a\_n \qquad (\forall v) \,\mathsf{proc}\_S(v) \neq \pi \\ (\forall v,v') \,\mathsf{proj}\_R(v) = \mathsf{proj}\_R(v') = \pi \implies v = v' \land \mathsf{des}\mathsf{r}\_\pi = \bot \\ (\forall v) \, v \to \text{send}(p,\pi,(q,\mathsf{v})) \implies \mathsf{des}\mathsf{st}\_\pi' = q \quad \mathsf{des}\mathsf{r}\_\pi \neq \bot \implies \mathsf{des}\mathsf{st}\_\pi' = \mathsf{des}\mathsf{r}\_\pi \\ C\_X^{\pi,\prime} = C\_X^{\pi} \cup \{\mathsf{proc}\_S(v') \mid v \stackrel{SS}{\longrightarrow} \ast v' \land v' \cap X \neq \emptyset \,\,\mathsf{k} \,\langle \mathsf{proc}\_R(v) = \pi \text{ or } v = \psi\_{\mathsf{tran}}\rangle\} \\ & \cup \{\mathsf{proc}\_S(v) \mid \mathsf{proc}\_R(v) = \pi \land X = S\} \\ & \cup \{p \mid p \in C\_{X,q} \,\,\&\, v \stackrel{SS}{\longrightarrow} \ast q \,\,\!\,\mathsf{k} \,\,\!\langle \mathsf{proc}\_R(v) = \pi \text{ or } v = \psi\_{\mathsf{tran}}\rangle\} \\ & \qquad \mathsf{des}\_\pi' \neq C\_R^{\pi'} \\ \hline (\vec{l},B,C\_S^{\pi},C\_R^{\pi}, \mathsf{des}\mathsf{st}\_\pi) \stackrel{\mathcal{C},k}{\underset{\mathsf{fas}}{\longrightarrow}} (\vec{l}',B',C\_S^{\pi'},C\_R^{\pi'}, \mathsf{des}\mathsf{st}\_\pi') \\ & \text{Fig. 6: Definition of the relation } \frac{e,k}$$

*Comparison with [4]*. In [4] the authors verify that an execution is feasible with a *monitor* which reviews the actions of the execution and adds processes that no longer are allowed to send a message to the receiver of π. Unfortunately, we have here a similar problem that the one mentioned in the previous comparison paragraph. According to their monitor, the following execution e- <sup>=</sup> deviate(e· <sup>r</sup>) is feasible, i.e., is runnable in Sand <sup>e</sup> · <sup>r</sup> is runnable in <sup>S</sup>.

$$\begin{aligned} e' &= \operatorname{send}(q,\pi,(r,\mathbf{v}\_1)) \cdot \operatorname{rec}(q,\pi,(r,\mathbf{v}\_1)) \cdot \operatorname{send}(q,s,\mathbf{v}\_2) \cdot \operatorname{rec}(q,s,\mathbf{v}\_2) \cdot \\ &\operatorname{send}(p,s,\mathbf{v}\_3) \cdot \operatorname{rec}(p,s,\mathbf{v}\_3) \cdot \operatorname{send}(p,r,\mathbf{v}\_4) \cdot \operatorname{rec}(p,r,\mathbf{v}\_4) \cdot \\ &\operatorname{send}(\pi,r,\mathbf{v}\_1) \cdot \operatorname{rec}(\pi,r,\mathbf{v}\_4) \end{aligned}$$

However, this execution is not feasible because there is a causal dependency between **v**<sup>1</sup> and **v**3. In [4] this execution would then be considered as feasible and therefore would belong to set sT rk(S- ). Yet there is no corresponding execution in asT r(S), the comparison and therefore the k-synchronizability, could be distorted and appear as a false negative.

**Recognition of Bad Executions.** Finally, we define a non-deterministic finite state automaton that recognizes MSCs of bad executions, i.e., feasible executions e- <sup>=</sup> deviate(<sup>e</sup> · <sup>r</sup>) such that <sup>e</sup> · <sup>r</sup> is not <sup>k</sup>-synchronizable. We come back to the "non-extended" conflict graph, without edges of the form XY --. Let Post∗(v) = {v- <sup>∈</sup> <sup>V</sup> <sup>|</sup> <sup>v</sup> <sup>→</sup><sup>∗</sup> <sup>v</sup>- } be the set of vertices reachable from <sup>v</sup>, and let Pre∗(v) = {v- <sup>∈</sup> <sup>V</sup> <sup>|</sup> <sup>v</sup>- <sup>→</sup><sup>∗</sup> <sup>v</sup>} be the set of vertices co-reachable from <sup>v</sup>. For a set of vertices <sup>U</sup> <sup>⊆</sup> <sup>V</sup> , let Post∗(U) = {Post∗(v) <sup>|</sup> <sup>v</sup> <sup>∈</sup> <sup>U</sup>}, and Pre∗(U) = {Pre∗(v) <sup>|</sup> <sup>v</sup> <sup>∈</sup> <sup>U</sup>}.

**Lemma 5.** *The feasible execution* e*is bad iff one of the two holds*


In order to determine whether a given message exchange v of CG(e- ) should be counted as reachable (resp. co-reachable), we will compute at the entry and exit of every k-exchange of ewhich processes are "reachable" or "co-reachable".

#### *Example 7.* (Reachable and Co-reachable Processes)

Consider the MSC on the right made of five 1-exchanges. While sending message (s, **v**0) that corresponds to υstart, process r becomes "reachable": any subsequent message exchange that involves r corresponds to a vertex of the conflict graph that is reachable from υstart. While sending **v**2, process s becomes "reachable", because process r will be reachable when it will receive message **v**2. Similary, q becomes reachable after receiving **v**<sup>3</sup> because r was reachable when it sent **v**3, and p becomes reachable after receiving **<sup>v</sup>**<sup>4</sup> because <sup>q</sup> was reachable when it sent msc(e)

**v**4. Co-reachability works similarly, but reasoning backwards on the timelines. For instance, process s stops being "co-reachable" while it receives **v**0, process r stops being co-reachable after it receives **v**2, and process p stops being coreachable by sending **v**1. The only message that is sent by a process being both reachable and co-reachable at the instant of the sending is **v**2, therefore it is the only message that will be counted as contributing to the SCC.

More formally, let e be sequence of actions, CG(e) its conflict graph and P, Q two sets of processes, Poste(P) = Post<sup>∗</sup> {<sup>v</sup> <sup>|</sup> procs(v) <sup>∩</sup> <sup>P</sup> <sup>=</sup> ∅} and Pree(Q) = Pre<sup>∗</sup> {<sup>v</sup> <sup>|</sup> procs(v) <sup>∩</sup> <sup>Q</sup> <sup>=</sup> ∅} are introduced to represent the local view through k-exchanges of Post∗(υstart) and Pre∗(υstop). For instance, for e as in Example 7, we get Poste({π}) = {(s, **<sup>v</sup>**0), **<sup>v</sup>**2, **<sup>v</sup>**3, **<sup>v</sup>**4, **<sup>v</sup>**0} and Pree({π}) = {**v**0, **<sup>v</sup>**2, **<sup>v</sup>**1,(s, **<sup>v</sup>**0)}. In each <sup>k</sup>-exchange <sup>e</sup><sup>i</sup> the size of the intersection between Post<sup>e</sup><sup>i</sup> (P) and Pre<sup>e</sup><sup>i</sup> (Q) will give the local contribution of the current k-exchange to the calculation of the size of the global SCC. In the transition relation e,k ===⇒ bad this value is stored in variable cnt. The last ingredient to consider is to recognise if an edge RS belongs to the SCC. To this aim, we use a function lastisRec : <sup>P</sup> → {True, False} that for each process stores the information whether the last action in the previous k-exchange was a reception or not. Then depending on the value of this variable and if a node is in the current SCC or not the value of sawRS is set accordingly.

The transition relation e,k ===⇒ bad defined in Fig. <sup>7</sup> deals with abstract configurations of the form (P, Q, cnt, sawRS, lastisRec- ) where P, Q <sup>⊆</sup> <sup>P</sup>, sawRS is a boolean value, and cnt is a counter bounded by k+ 2. We denote by lastisRec<sup>0</sup> the function where all lastisRec(p) = False for all <sup>p</sup> <sup>∈</sup> <sup>P</sup>.

**Lemma 6.** *Let* e *be a feasible* k*-synchronizable execution of* S- *. Then* e *is a bad execution iff there are* e-- <sup>=</sup> <sup>e</sup><sup>1</sup> ··· <sup>e</sup><sup>n</sup> · send(π, q, **<sup>v</sup>**) · rec(π, q, **<sup>v</sup>**) *with* <sup>e</sup>1,...,e<sup>n</sup> <sup>∈</sup> S≤<sup>k</sup>R≤<sup>k</sup> *and* msc(e- ) = msc(e--)*,* P- , Q <sup>⊆</sup> <sup>P</sup>*,* sawRS ∈ {True, False}*,* cnt <sup>∈</sup> {0,...,k + 2}*, such that*

$$(\{\pi\}, Q, 0, \mathsf{False}, \mathsf{lastisRes}\_0) \xrightarrow[\mathsf{bad}]{e\_1, k} \dots \xrightarrow[\mathsf{bad}]{e\_n, k} (P', \{\pi\}, \mathsf{cut}, \mathsf{sawRS}, \mathsf{lastisRec})$$

$$\begin{aligned} P' &= \mathsf{procs}(\mathsf{Post}\_{e}(P)) \qquad Q = \mathsf{procs}(\mathsf{Pre}\_{e}(Q'))\\ &\qquad SCC\_{e} = \mathsf{Post}\_{e}(P) \cap \mathsf{Pre}\_{e}(Q')\\ \mathsf{cnt'} &= \min(k + 2, \mathsf{cnt} + n) \quad \text{where } n = |SCC\_{e}|\\ \mathsf{last} &\mathsf{isRec'}(q) \Leftrightarrow (\exists v \in SCC\_{e}.\mathsf{proc}\_{R}(v) = q \land v \cap R \neq \emptyset) \lor\\ &\qquad (\mathsf{lastisRec}(q) \land \exists v \in V.\mathsf{proc}\_{S}(v) = q)\\ &\qquad \mathsf{aswRS'} = \mathsf{asnRS'}\\ (\exists v \in SCC\_{e})(\exists p \in \mathbb{P} \mid \{\pi\}) \; \mathsf{proc}\_{S}(v) = p \land \mathsf{lastisRec}(p) \land p \in P \cap Q\\ &\qquad (P, Q, \mathsf{cnt}, \mathsf{sawRS, 1astisRec}) \stackrel{e,k}{\underset{\mathsf{bad}}{\Longrightarrow}} (P', Q', \mathsf{cnt'}, \mathsf{sawRS', 1astisRec'})\\ &\qquad \qquad \mathsf{Fig. 7: Definition of the relation} \; \frac{e,k}{\mathsf{bad}} \end{aligned}$$

*and at least one of the two holds: either* sawRS = True*, or* cnt = k + 2*.*

*Comparison with [4]*. As for the notion of feasibility, to determine if an execution is bad, in [4] the authors use a monitor that builds a path between the send to process π and the send from π. In addition to the problems related to the wrong characterisation of k-synchronizability, this monitor not only can detect an RS edge when there should be none, but also it can miss them when they exist. In general, the problem arises because the path is constructed by considering only an endpoint at the time.

We can finally conclude that:

**Theorem 4.** *The* <sup>k</sup>*-synchronizability of a system* <sup>S</sup> *is decidable for* <sup>k</sup> <sup>≥</sup> <sup>1</sup>*.*

## **6** *k***-synchronizability for Peer-to-Peer Systems**

In this section, we will apply k-synchronizability to peer-to-peer systems. A peerto-peer system is a composition of communicating automata where each pair of machines exchange messages via two private FIFO buffers, one per direction of communication. Here we only give an insight on what changes with respect to the mailbox setting.

Causal delivery reveals the order imposed by FIFO buffers. Definition 4 must then be adapted to account for peer-to-peer communication. For instance, two messages that are sent to a same process p by two different processes can be received by p in any order, regardless of any causal dependency between the two sends. Thus, checking causal delivery in peer-to-peer systems is easier than in the mailbox setting, as we do not have to carry information on causal dependencies.

Within a peer-to-peer architecture, MSCs and conflict graphs are defined as within a mailbox communication. Indeed, they represents dependencies over machines, i.e., the order in which the actions can be done on a given machine, and over the send and the reception of a same message, and they do not depend on the type of communication. The notion of k-exchange remains also unchanged.

**Decidability of Reachability for** k**-synchronizable Peer-to-Peer Systems.** To establish the decidability of reachability for k-synchronizable peer-topeer systems, we define a transition relation e,k ==⇒ cd p2p for a sequence of action e describing a k-exchange. As for mailbox systems, if a send action is unmatched in the current k-exchange, it will stay orphan forever. Moreover, after a process p sent an orphan message to a process q, p is forbidden to send any matched message to q. Nonetheless, as a consequence of the simpler definition of causal delivery, , we no longer need to work on the conflict graph. Summary nodes and extended edges are not needed and all the necessary information is in function B that solely contains all the forbidden senders for process p.

The characterisation of a k-synchronizable execution is the same as for mailbox systems as the type of communication is not relevant. We can thus conclude, as within mailbox communication, that reachability is decidable.

**Theorem 5.** *Let* <sup>S</sup> *be a* <sup>k</sup>*-synchronizable system and* l *a global control state of* <sup>S</sup>*. The problem whether there exists* <sup>e</sup> <sup>∈</sup> asEx(S) *and* Buf *such that* (l0, Buf0) <sup>e</sup> =⇒ (l, Buf) *is decidable.*

**Decidability of** k**-synchronizability for Peer-to-Peer Systems.** As in mailbox system, the detection of a borderline execution determines whether a system is k-synchronizable.

The relation transition e,k ===⇒ feas p2p allows to obtain feasible executions. Differently from the mailbox setting, we need to save not only the recipient dest<sup>π</sup> but also the sender of the delayed message (information stored in variable expπ). The transition rule then checks that there is no message that is violating causal delivery, i.e., there is no message sent by exp<sup>π</sup> to dest<sup>π</sup> after the deviation. Finally the recognition of bad execution, works in the same way as for mailbox systems. The characterisation of a bad execution and the definition of e,k ===⇒ bad p2p are, therefore, the same.

As for mailbox systems, we can, thus, conclude that for a given k, k-synchronizability is decidable.

**Theorem 6.** *The* <sup>k</sup>*-synchronizability of a system* <sup>S</sup> *is decidable for* <sup>k</sup> <sup>≥</sup> <sup>1</sup>*.*

## **7 Concluding Remarks and Related works**

In this paper we have studied k-synchronizability for mailbox and peer-to-peer systems. We have corrected the reachability and decidability proofs given in [4]. The flaws in [4] concern fundamental points and we had to propose a considerably different approach. The extended edges of the conflict graph, and the graph-theoretic characterisation of causal delivery as well as summary nodes, have no equivalent in [4]. Transition relations e,k ===⇒ feas and e,k ===⇒ bad building on the graph-theoretic characterisations of causal delivery and k-synchronizability, depart considerably from the proposal in [4].

We conclude by commenting on some other related works. The idea of "communication layers" is present in the early works of Elrad and Francez [8] or Chou and Gafni [7]. More recently, Chaouch-Saad *et al.* [6] verified some consensus algorithms using the Heard-Of Model that proceeds by "communication-closed rounds". The concept that an asynchronous system may have an "equivalent" synchronous counterpart has also been widely studied. Lipton's reduction [14] reschedules an execution so as to move the receive actions as close as possible from their corresponding send. Reduction recently received an increasing interest for verification purpose, e.g. by Kragl *et al.* [12], or Gleissenthal *et al.* [11].

Existentially bounded communication systems have been studied by Genest *et al.* [10,15]: a system is existentially k-bounded if any execution can be rescheduled in order to become k-bounded. This approach targets a broader class of systems than k-synchronizability, because it does not require that the execution can be chopped in communication-closed rounds. In the perspective of the current work, an interesting result is the decidability of existential k-boundedness for deadlock-free systems of communicating machines with peer-to-peer channels. Despite the more general definition, these older results are incomparable with the present ones, that deal with systems communicating with mailboxes, and not peer-to-peer channels.

Basu and Bultan studied a notion they also called synchronizability, but it differs from the notion studied in the present work; synchronizability and ksynchronizability define incomparable classes of communicating systems. The proofs of the decidability of synchronizability [3,2] were shown to have flaws by Finkel and Lozes [9]. A question left open in their paper is whether synchronizability is decidable for mailbox communications, as originally claimed by Basu and Bultan. Akroun and Sala¨un defined also a property they called stability [1] and that shares many similarities with the synchronizability notion in [2].

Context-bounded model-checking is yet another approach for the automatic verification of concurrent systems. La Torre *et al.* studied systems of communicating machines extended with a calling stack, and showed that under some conditions on the interplay between stack actions and communications, contextbounded reachability was decidable [13]. A context-switch is found in an execution each time two consecutive actions are performed by a different participant. Thus, while k-synchronizability limits the number of consecutive sendings, bounded context-switch analysis limits the number of times two consecutive actions are performed by two different processes.

As for future work, it would be interesting to explore how both contextboundedness and communication-closed rounds could be composed. Moreover refinements of the definition of k-synchronizability can also be considered. For instance, we conjecture that the current development can be greatly simplified if we forbid linearisation that do not correspond to actual executions.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## General Supervised Learning as Change Propagation with Delta Lenses

Zinovy Diskin()

McMaster University, Hamilton, Canada diskinz@mcmaster.ca

Abstract. Delta lenses are an established mathematical framework for modelling and designing bidirectional model transformations (Bx). Following the recent observations by Fong et al, the paper extends the delta lens framework with a a new ingredient: learning over a parameterized space of model transformations seen as functors. We will define a notion of an asymmetric learning delta lens with amendment (ala-lens), and show how ala-lenses can be organized into a symmetric monoidal (sm) category. We also show that sequential and parallel composition of wellbehaved (wb) ala-lenses are also wb so that wb ala-lenses constitute a full sm-subcategory of ala-lenses.

## 1 Introduction

The goal of the paper is to develop a formal model of *supervised learning* in a very general context of *bidirectional model transformation* or *Bx*, i.e., synchronization of two arbitrary complex structures (called *models*) related by a transformation.<sup>1</sup> Rather than learning parameterized functions between Euclidean spaces as is typical for machine learning (ML), we will consider learning mappings between model spaces and formalize them as parameterized functors between categories, f: P×**<sup>A</sup>** <sup>→</sup> **<sup>B</sup>**, with P being a parameter space. The basic ML-notion of a *training pair* (A, B- ) ∈ **A**<sup>0</sup> × **B**<sup>0</sup> will be considered as an inconsistency between models caused by a change (*delta*) v: B <sup>→</sup> B of the target model B <sup>=</sup> f(p, A), p <sup>∈</sup> P, that was first consistent with A w.r.t. the transformation (functor) f(p, \_). An inconsistency is repaired by an appropriate change of the source structure, u: A <sup>→</sup> A- , changing the parameter p to p- , and an *amendment* of the target structure v@: <sup>B</sup>- <sup>→</sup> B@ so that f(p- , A- ) = B@ is a consistent state of the parameterized two-model system.

The setting above without parameterization and learning (i.e., p- <sup>=</sup> p always holds), and without amendment (v@ <sup>=</sup> id<sup>B</sup> always holds), is well known in the Bx literature under the name of *delta lenses*— mathematical structures, in

<sup>1</sup>Term Bx refers to a wide area including file synchronization, data exchange in databases, and model synchronization in Model-Driven software Engineering (MDE), see [7] for a survey. In the present paper, Bx will mainly refer to Bx in the MDE context.

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 177–197, 2020.

https://doi.org/10.1007/978-3-030-45231-5\_10

which consistency restoration via change propagation is modelled by functoriallike algebraic operations over categories [12,6]. There are several types of delta lenses tailored for modelling different synchronization tasks and scenarios, particularly, symmetric and asymmetric. In the paper, we only consider asymmetric delta lenses and will often omit explicit mentioning these attributes. Despite their extra-generality, (delta) lenses have been proved useful in the design and implementation of practical model synchronization systems with triple graph grammars (TGG) [5,2]; enriching lenses with amendment is a recent extension of the framework motivated and formalized in [11]. A major advantage of the lens framework for synchronization is its compositionality: a lens satisfying several equational laws specifying basic synchronization requirements is called *wellbehaved (wb)*, and basic lens theorems state that sequential and parallel composition of wb lenses is again wb. In practical applications, it allows the designer of a complex synchronizer to avoid integration testing: if elementary synchronizers are tested and proved to be wb, their composition is automatically wb too.

The present paper makes the following contributions to the delta lens framework for Bx. a) We motivate model synchronization enriched with learning and, moreover, with *categorical* learning, in which the parameter space is a category, and introduce the notion of a *wb asymmetric learning (delta) lens* with *amendment* (a *wb ala-lens*) (this is the content of Sect. 3). b) We prove compositionality of wb ala-lenses and show how their universe can be organized into a symmetric monoidal (sm) category (Theorems 1-3 in Sect. 4). All proofs (rather straightforward but notationally laborious) can be found in the long version of the paper [9]. One more compositional result is c) a definition of a *compositional bidirectional transformation language* (Def. 6) that formalizes an important requirement to model synchronization tools, which (surprisingly) is missing from the Bx literature. Background Sect. 2 provides a simple example demonstrating main concepts of Bx and delta lenses in the MDE context. Section 5 briefly surveys related work, and Sect. 6 concludes.

*Notation.* Given a category **<sup>A</sup>**, its objects are denoted by capital letters A, A- , etc. to recall that in MDE applications, objects are complex structures, which themselves have elements a, a- , ....; the collection of all objects of category **<sup>A</sup>** is denoted by **<sup>A</sup>**0. An arrow with domain <sup>A</sup> <sup>∈</sup> **<sup>A</sup>**<sup>0</sup> is written as <sup>u</sup>: <sup>A</sup> <sup>→</sup> \_ or u <sup>∈</sup> **<sup>A</sup>**(A, \_); we also write dom(u) = A (and sometimes udom <sup>=</sup> <sup>A</sup> to shorten formulas). Similarly, formula u: \_ <sup>→</sup> A denotes an arrow with codomain u.cod <sup>=</sup> A- . Given a functor <sup>f</sup>: **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>**, its object function is denoted by <sup>f</sup><sup>0</sup>: **<sup>A</sup>**<sup>0</sup> <sup>→</sup> **<sup>B</sup>**0.

A subcategory **B** ⊂ **A** is called *wide* if it has the same objects. All categories we consider in the paper are small.

## 2 Background: Update propagation and delta lenses

Although Bx ideas work well only in domains conforming to the slogan *any implementation satisfying the specification is good enough* such as code generation (see [10] for discussion), and have limited applications in databases (only so called updatable views can be treated in the Bx-way), we will employ a simple database example: it allows demonstrating the core ideas without any special domain knowledge required by typical Bx-amenable areas. The presentation will be semi-formal as our goal is to motivate the delta lens formalism that abstracts the details away rather than formalize the example as such.

#### 2.1 Why deltas.

Bx-lenses first appeared in the work on file synchronization, and if we have two sets of strings, say, B <sup>=</sup> {John, Mary} and B- <sup>=</sup> {Jon, Mary}, we can readily see the difference: John = Jon but Mary = Mary. We thus have a structure in-between B and B- (which maybe rather complex if B and B are big files), but this structure can be recovered by string matching and thus updates can be identified with pairs. The situation dramatically changes if B and B are object structures, e.g., <sup>B</sup> <sup>=</sup> {o1, o<sup>2</sup>} with Name(o1) = John, Name(o2) = Mary and similarly B- <sup>=</sup> {o- 1, o- <sup>2</sup>} with Name(o- <sup>1</sup>) = Jon, Name(o- <sup>2</sup>) = Mary. Now string matching does not say too much: it may happen that <sup>o</sup><sup>1</sup> and <sup>o</sup>- <sup>1</sup> are the same object (think of a typo in the dataset), while <sup>o</sup><sup>2</sup> and <sup>o</sup>- <sup>2</sup> are different (although equally named) objects. Of course, for better matching we could use full names or ID numbers or something similar (called, in the database parlance, primary keys), but absolutely reliable keys are rare, and typos and bugs can compromise them anyway. Thus, for object structures that Bx needs to keep in sync, deltas between models need to be independently specified, e.g., by specifying a *sameness relation* u <sup>⊂</sup> B×B between models. For example, u <sup>=</sup> {o1, o- <sup>1</sup>} says that John@B and Jon@B are the same person while Mary@B and Mary@B are not. Hence, model spaces in Bx are categories (objects are models and arrows are update/delta specifications) rather than sets (codiscrete categories).

#### 2.2 Consistency restoration via update propagation: An Example

Figure 1 presents a simple example of delta propagation for consistency restoration. Models consist of objects (in the sense of OO programming) with attributes (a.k.a. labelled records), e.g., the source model A consists of three objects identified by their oids (object identifiers) #A, #J, #M (think about employees of some company) with attribute values as shown in the table: attribute Expr. refers to Experience measured by a number of years, and Depart. is the column of department names. The schema of the table, i.e., the triple <sup>S</sup>**<sup>A</sup>** of attributes (Name, Expr., Depart.) with their domains of values String, Integer Integer, String resp., determines a model space **<sup>A</sup>**. A model X <sup>∈</sup> **<sup>A</sup>** is given by its set of objects OID<sup>X</sup> together with three functions Name<sup>X</sup>, Expr. <sup>X</sup>, Depart. <sup>X</sup> from the same domain OID<sup>X</sup> to targets String, Integer Integer, String resp., which are compactly specified by tables as shown for model A. The target model space **<sup>B</sup>** is given by a similar schema <sup>S</sup>**<sup>B</sup>** consisting of two attributes. The **<sup>B</sup>**-view get(X) of an **<sup>A</sup>**-model <sup>X</sup> is computed by selecting those oids #O <sup>∈</sup> OID<sup>X</sup> for which Depart. <sup>X</sup>(#O) is an *IT-department*, i.e., an element of the set IT def <sup>=</sup> {ML, DB}. For example, the upper part of the figure shows the IT-view B of model A.

We assume that all column names in schemas <sup>S</sup>**A**, and <sup>S</sup>**<sup>B</sup>** are qualified by schema names, e.g., OID@S**A**, OID@S**<sup>B</sup>** etc, so that schemas are disjoint except elementary domains like String etc. Also disjoint are OID-values, e.g., #J@A and #J@B are different elements, but constants like John and Mary are elements of set String shared by both schemas. To shorten long expressions in the diagrams, we will often omit qualifiers and write #J = #J meaning #J@A = #J@B or #J@B = #J@B depending on the context given by the diagram; often we will also write #J and #J for such OIDs. Also, when we write #J = #J inside block arrows denoting updates, we actually mean a pair, e.g., (#J@B, #J@B- ).

Given two models over the same schema, say, B and B over S**<sup>B</sup>**, an update v: B <sup>→</sup> B is a relation v <sup>⊂</sup> OID<sup>B</sup>×OID<sup>B</sup>- ; if a schema contains several nodes, an update should provide a relation <sup>v</sup><sup>N</sup> for each node <sup>N</sup> in the schema.

Note an essential difference between the two parallel updates <sup>v</sup>1, v2: B <sup>→</sup> B- specified in the figure. Update <sup>v</sup><sup>1</sup> says that John's name was changed to Jon (think of fixing a typo), and the experience data for Mary were also corrected (either because of a typo or, e.g., because the department started to use a new ML method for which Mary has a longer experience). Update <sup>v</sup><sup>2</sup> specifies the same story for John but a new story for Mary: it says that Mary #M left the IT-view and Mary #Mis a new employee in one of IT-departments.

Fig. 1: Example of update propagation

#### 2.3 Update propagation and update policies

The updated view B is inconsistent with the source A and the latter is to be updated accordingly — we say that update v is to be propagated back to A. Propagation of <sup>v</sup><sup>1</sup> is easy: we just update accordingly the values of the attributes as shown in the figure in the block arrow <sup>u</sup>1: A <sup>→</sup> A- <sup>1</sup> (of black colour). Importantly, propagation needs two pieces of data: the view update <sup>v</sup><sup>1</sup> and the original state A of the source as shown in the figure by two data-flow lines into the chevron 1:put; the latter denotes invocation of the backward propagation operation put (read "put view update back to the source"). The quadruple 1=(v1, A, u1, A- ) can be seen as an *instance* of operation put, hence the notation 1:put (borrowed from the UML).

Propagation of update <sup>v</sup><sup>2</sup> is more challenging: Mary can disappear from the IT-view because a) she quit the company, b) she transitioned to a non-IT department, and c) the view definition has changed, e.g., the new view must only show employees with experience more than 5 years. Choosing between these possibilities is often called choosing an *(update) policy*. We will consider the case of changing the view in Sect. 3, and in the current section discuss policies a) and b) (ignore for a while the propagation scenario shown in blue in the right lower corner of the figure that shows policy c)).

For policy a), further referred to as *quiting* and briefly denoted by qt, the result of update propagation is shown in the figure with green colour: notice the update (block) arrow uqt <sup>2</sup> and its result, model <sup>A</sup> - qt <sup>2</sup> , produced by invoking operation putqt. Note that while we know the new employee Mary works in one of IT departments, we do not know in which one. This is specified with a special value '?' (a.k.a. labelled null in the database parlance).

For policy b), further referred to as *transition* and denoted tr, the result of update propagation is shown in the figure with orange colour: notice update arrow utr <sup>2</sup> and its result, model <sup>A</sup>- tr <sup>2</sup> produced by puttr. Mary #M is the old employee who transitioned to a new non-IT department, for which her expertize is unknown. Mary #M' is a new employee in one of IT-departments (we assume that the set of departments is not exhausted by those appearing in a particular state A <sup>∈</sup> **<sup>A</sup>**). There are also updates whose backward propagation is uniquely defined and does not need a policy, e.g., update <sup>v</sup><sup>1</sup> is such.

An important property of update propagations we have considered is that they restore consistency: the view of the updated source equals to the updated view initiated the update: get(A- ) = B- ; moreover, this equality extends for update arrows: get(u<sup>i</sup>) = <sup>v</sup><sup>i</sup>, <sup>i</sup> = 1, <sup>2</sup>. Such extensions can be derived from view definitions if the latter are determined by so called monotonic queries (which encompass a wide class of practically useful queries including the Select-Project-Join class). For views defined by non-monotonic queries, in order to obtain get's action on source updates u: A <sup>→</sup> A- , a suitable policy is to be added to the view definition (see [1,14,12] for details and discussion). Moreover, normally get preserves identity updates, get(idA) = idget(A), and update composition: for any u: A <sup>→</sup> A and u- : A- <sup>→</sup> A--, equality get(u; u- ) = get(u); get(u- ) holds.

## 2.4 Delta lenses

Our discussion of the example can be summarized in the following algebraic terms. We have two categories of *models* and *updates*, **A** and **B**, and a functor get: **A** → **B** incrementally computing **B**-views of **A**-models (we will often write A.get for get(A)). We also suppose that for a chosen update policy, we have worked out precise procedures for how to propagate any view update backwards. This gives us a family of operations put<sup>A</sup> : **<sup>A</sup>**(A, \_) <sup>←</sup> **<sup>B</sup>**(A.get, \_) indexed by **<sup>A</sup>**-objects, A <sup>∈</sup> **<sup>A</sup>**0, for which we write put<sup>A</sup>.v or putA(v) interchangeably.

Definition 1 (Delta Lenses ([12])) Let **A**, **B** be two categories. An *(asymmetric delta) lens* from **A** (the source of the lens) to **B** (the target) is a pair - = (get, put), where get: **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** is a functor and put is a family of operations put<sup>A</sup> : **<sup>A</sup>**(A, \_) <sup>←</sup> **<sup>B</sup>**(A.get, \_) indexed by objects of **<sup>A</sup>**, <sup>A</sup> <sup>∈</sup> **<sup>A</sup>**0. Given <sup>A</sup>, operation put<sup>A</sup> maps any arrow <sup>v</sup>: A.get <sup>→</sup> <sup>B</sup> to an arrow u: A <sup>→</sup> A such that A- .get <sup>=</sup> B- . The last condition is called (co)discrete Putget law:

(Putget)<sup>0</sup> (put<sup>A</sup>.v).cod.get<sup>0</sup> <sup>=</sup> v.cod for all <sup>A</sup> <sup>∈</sup> **<sup>A</sup>**<sup>0</sup> and <sup>v</sup> <sup>∈</sup> **<sup>B</sup>**(A.get, \_)

where get<sup>0</sup> denotes the object function of functor get. We will write a lens as an arrow -: **A** → **B** going in the direction of get.

Note that family put corresponds to a chosen update policy, e.g., in terms of the example above, for the same view functor get, we have two families of put-operations, putqt and puttr, corresponding to the two updated policies we discussed. These two policies determine two lenses qt = (get, putqt) and tr = (get, puttr) sharing the same get.


(Stability) id<sup>A</sup> <sup>=</sup> put<sup>A</sup>.idA.get for all <sup>A</sup> <sup>∈</sup> **<sup>A</sup>**<sup>0</sup>

(Putget) (put<sup>A</sup>.v).get <sup>=</sup> <sup>v</sup> for all <sup>A</sup> <sup>∈</sup> **<sup>A</sup>**<sup>0</sup> and all <sup>v</sup> <sup>∈</sup> **<sup>B</sup>**(A.get, \_)

*Remark 1.* Stability law says that a wb lens does nothing if nothing happens on the target side (no actions without triggers). Putget requires consistency after the backward propagation is finished. Note the distinction between the Putget<sup>0</sup> condition included into the very definition of a lens, and the full Putget law required for the wb lenses. The former is needed to ensure smooth tiling of put-squares (i.e., arrow squares describing application of put to a view update and its result) both horizontally (for sequential composition) and vertically (not considered in the paper). The full Putget assures true consistency as considering a state B alone does not say much about the real update and elements of B- cannot be properly interpreted. The real story is specified by delta v: B <sup>→</sup> B- , and consistency restoration needs the full (PutGet) law as above. <sup>2</sup>

A more detailed trailer of lenses can be found in the long version [9].

<sup>2</sup>As shown in [6], the Putget<sup>0</sup> condition is needed if we want to define operations put separately from the functor get: then we still need a function get0: **A**<sup>0</sup> → **B**<sup>0</sup> and the codiscrete Putget law to ensure a reasonable behaviour of put.

## 3 Asymmetric Learning Lenses with Amendments

We will begin with a brief motivating discussion, and then proceed with formal definitions

#### 3.1 Does Bx need categorical learning?

Enriching delta lenses with learning capabilities has a clear practical sense for Bx. Having a lens (get, put): **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** and inconsistency A.get <sup>=</sup> B- , the idea of learning extends the notion of the search space and allows us to update the transformation itself so that the final consistency is achieved for a new transformation get- : A.get- <sup>=</sup> B- . For example, in the case shown in Fig. 1, disappearance of Mary #M in the updated view B can be caused by changing the view definition, which now requires to show only those employees whose experience is more than 5 years and hence Mary #M is to be removed from the view, while Mary #M' is a new IT-employee whose experience satisfies the new definition. Then the update <sup>v</sup><sup>2</sup> can be propagated as shown in the bottom right corner of Fig. 1, where index par indicates a new update policy allowing for view definition (parameter) change.

To manage the extended search possibilities, we parameterize the space of transformations as a family of mappings getp: **A** → **B** indexed over some parameter space p <sup>∈</sup> **<sup>P</sup>**. For example, we may define the IT-view to be parameterized by the experience of employees shown in the view (including *any* experience as a special parameter value). Then we have two interrelated propagation operations that map an update B -B to a parameter update pp and a source update A-A- . Thus, the extended search space allows for new update policies that look for updating the parameter as an update propagation possibility. The possibility to update the transformation appears to be very natural in at least two important Bx scenarios: a) model transformation design and b) model transformation evolution (cf. [21]), which necessitates the enrichment of the delta lens framework with parameterization and learning. Note that all transformations getp, p <sup>∈</sup> **<sup>P</sup>** are to be elements of the same lens, and operations put are *not* indexed by p, hence, formalization of learning by considering a family of ordinary lenses would *not* do the job.

Categorical vs. codiscrete learning Suppose that the parameter p is itself a set, e.g., the set of departments forming a view can vary depending on some context. Then an update from p to p has a relational structure as discussed above, i.e., e: p <sup>→</sup> p is a relation e <sup>⊂</sup> p×p specifying which departments disappeared from the view and which are freshly added. This is a general phenomenon: as soon as parameters are structures (sets of objects or graphs of objects and attributes), a parameter change becomes a structured delta and the space of parameters gives rise to a category **P**. The search/propagation procedure returns an arrow e: p <sup>→</sup> p in this category, which updates the parameter value from p to p- . Hence, a general model of supervised learning should assume **P** to be a category (and we say that learning is *categorical*). The case of the parameter space being a set is captured by considering a codiscrete category **P** whose only arrows are pairs of its objects; we call such learning *codiscrete*.

### 3.2 Ala-lenses

The notion of a *parameterized functor (p-functor)* is fundamental for ala-lenses, but is not a lens notion per se and is thus placed into Appendix Sect. A.1. We will work with its exponential (rather than equivalent product-based) formulation but will do uncurrying and currying back if necessary, and often using the same symbol for an arrow f and its uncurried version <sup>ˇ</sup>f.

Definition 3 (ala-lenses) Let **A** and **B** be categories. An *ala-lens* from **A** (the *source* of the lens) to **<sup>B</sup>** (the *target*) is a pair - = (get, put) whose first component is a p-functor get: **A <sup>P</sup>** ✲ **B** and the second component is a triple of (families of) operations put = (putupd p,A, putreq p,A, putself p,A) indexed by pairs p <sup>∈</sup> **<sup>P</sup>**0, A <sup>∈</sup> **<sup>A</sup>**0; arities of the operations are specified below after we introduce some notation. Names req (for 'request') and upd (for 'update') are chosen to match the terminology in [17].

Categories **A**, **B** are called *model spaces*, their objects are *models* and their arrows are *(model) updates* or *deltas*. Objects of **P** are called *parameters* and are denoted by small letters p, p- , .. rather than capital ones to avoid confusion with [17], in which capital P is used for the entire parameter set. Arrows of **<sup>P</sup>** are called *parameter deltas*. For a parameter <sup>p</sup> <sup>∈</sup> **<sup>P</sup>**0, we write get<sup>p</sup> for the functor get(p): **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** (read "get **<sup>B</sup>**-views of **<sup>A</sup>**"), and if <sup>A</sup> <sup>∈</sup> **<sup>A</sup>**<sup>0</sup> is a source model, its getp-view is denoted by getp(A) or A.get<sup>p</sup> or even <sup>A</sup><sup>p</sup> (so that \_<sup>p</sup> becomes yet another notation for functor getp). Given a parameter delta <sup>e</sup>: <sup>p</sup> <sup>→</sup> <sup>p</sup> and a source model <sup>A</sup> <sup>∈</sup> **<sup>A</sup>**0, the model delta get(e): getp(A) <sup>→</sup> get<sup>p</sup>- (A) will be denoted by gete(A) or <sup>e</sup><sup>S</sup> (rather than <sup>A</sup><sup>e</sup> as we would like to keep capital letters for objects only). In the uncurried version, gete(A) is nothing but get <sup>ˇ</sup> (e, idS)

Since get<sup>e</sup> is a natural transformation, for any delta <sup>u</sup>: <sup>A</sup> <sup>→</sup> <sup>A</sup> we have a commutative square e<sup>S</sup>; <sup>u</sup><sup>p</sup>- <sup>=</sup> u<sup>p</sup>; e<sup>A</sup>- (whose diagonal is get <sup>ˇ</sup> (e, u)). We will denote the diagonal of this square by u.get<sup>e</sup> or <sup>u</sup><sup>e</sup>: <sup>A</sup><sup>p</sup> <sup>→</sup> <sup>A</sup>- p- . Thus, we use notation

$$\begin{array}{ll}(1) & A\_p \stackrel{\text{def}}{=} A.\mathtt{get}\_p \stackrel{\text{def}}{=} \mathtt{get}\_p(A) \stackrel{\text{def}}{=} \mathtt{get}(p)(A) \\ & u\_e \stackrel{\text{def}}{=} u.\mathtt{get}\_e \stackrel{\text{def}}{=} \mathtt{get}\_e(u) \stackrel{\text{def}}{=} \mathtt{get}(e)(u) \stackrel{\text{def}}{=} e\_S; u\_{p'} \stackrel{\text{nat}}{=} u\_p; e\_{A'} \colon A\_p \to A'\_{p'} \end{array}$$

Now we describe operations put. They all have the same indexing set **P**<sup>0</sup> × **A**0, and the same domain: for any index p, A and any model delta <sup>v</sup>: <sup>A</sup><sup>p</sup> <sup>→</sup> <sup>B</sup> in **B**, the value put**<sup>x</sup>** p,A(p, A), **<sup>x</sup>** ∈ {req, upd,self} is defined and unique:

(2) putupd p,A: p <sup>→</sup> p is a parameter delta from p, putreq p,A: A <sup>→</sup> A is a model delta from A, putself p,A: B- <sup>→</sup> A- p is a model delta from <sup>B</sup>- called the *amendment* and denoted by v@.

Note that the definition of putself involves an equational dependency between all three operations: for all A <sup>∈</sup> **<sup>A</sup>**0, <sup>v</sup> <sup>∈</sup> **<sup>B</sup>**(A.get, \_), we require

$$(\text{putget})\_0 \quad (\text{put}\_A^{\text{req}}.v).\text{cod.get}\_{p'} = (v; \text{put}\_A^{\text{self}}).\text{cod } \text{where } p' = (\text{put}\_A^{\text{upd}}.v).\text{cod}$$

We will write an ala-lens as an arrow - = (get, put): **<sup>A</sup> <sup>P</sup>** ✲ **B**.

A lens is called *(twice) codiscrete* if categories **A**, **B**, **P** are codiscrete and thus get: **A <sup>P</sup>** ✲ **B** is a parameterized function. If only **P** is codiscrete, we call a *codiscretely learning* delta lens, while if only model spaces are codiscrete, we call a *categorically learning* codiscrete lens.

Diagram in Fig. 2 shows how a lens' operations are interrelated. The upper part shows an arrow e: p <sup>→</sup> p- in category **P** and two corresponding functors from **A** to **B**. The lower part is to be seen as a 3D-prism with visible front face AA<sup>p</sup>-A- p-A and visible upper face AA<sup>p</sup>A<sup>p</sup>- , the bottom and two back faces are invisible, and the corresponding arrows are dashed. The prism denotes an algebraic term: given elements are shown with black fill and white font while derived elements are blue (recalls being mechanically computed) and blank (double-body arrows are considered as "blank"). The two pairs of arrows originating from A and A are not blank because they denote pairs of nodes (the UML says *links*) rather than mappings/deltas between nodes.

Fig. 2: Ala-lens operations

Equational definitions of deltas e, u, v@ are written up in the three callouts near them. The right back face of the prism is formed by the two vertical derived deltas <sup>u</sup><sup>p</sup> <sup>=</sup> u.get<sup>p</sup> and <sup>u</sup><sup>p</sup>- <sup>=</sup> u.get<sup>p</sup>- , and the two matching them horizontal derived deltas <sup>e</sup><sup>S</sup> <sup>=</sup> gete(A) and <sup>e</sup><sup>A</sup>- <sup>=</sup> gete(A- ); together they form a commutative square due to the naturality of get(e) as explained earlier.

Definition 4 (Well-behavedness) An ala-lens is called *well-behaved (wb)* if the following two laws hold for all <sup>p</sup> <sup>∈</sup> **<sup>P</sup>**0, <sup>A</sup> <sup>∈</sup> **<sup>A</sup>**<sup>0</sup> and <sup>v</sup>: <sup>A</sup><sup>p</sup> <sup>→</sup> <sup>B</sup>- :

$$\begin{array}{llll} \text{(\textbf{Stability})} & \text{if } v = \text{id}\_{A\_p} \text{ then all three propagated updates } e, u, v^{\text{@}} \text{ are identities:}\\ & \text{put}\_{p,A}^{\text{upd}}(\text{id}\_{A\_p}) = \text{id}\_p, \quad \text{put}\_{p,A}^{\text{req}}(\text{id}\_{A\_p}) = \text{id}\_S, \quad \text{put}\_{p,A}^{\text{self}}(\text{id}\_{A\_p}) = \text{id}\_{A\_p} \\ & (\text{Putget}) & (\text{put}\_{p,A}^{\text{req}}v).\text{get}\_e = v; v^{\text{@}} \text{ where } e = \text{put}\_{p,A}^{\text{upd}}(v) \text{ and } v^{\text{@}} = \text{put}\_{p,A}^{\text{self}}(v) \end{array}$$

*Remark 2.* Note that Remark 1 about the Putget law is again applicable.

*Example 1 (Identity lenses).* Any category **A** gives rise to an ala-lens *id* **<sup>A</sup>** with the following components. The source and target spaces are equal to **A**, and the parameter space is **1**. Functor get is the identity functor and all puts are identities. Obviously, this lens is wb.

*Example 2 (Iso-lenses).* Let ι: **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** be an isomorphism between model spaces. It gives rise to a wb ala-lens -(ι): **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** with **<sup>P</sup>**-(ι) <sup>=</sup> **<sup>1</sup>** <sup>=</sup> {∗} as follows. Given any A in **<sup>A</sup>** and v: ι(A) <sup>→</sup> B in **B**, we define put-(ι).req <sup>∗</sup>,A (v) = <sup>ι</sup> <sup>−</sup>1(v) while the two other put operations map v to identities.

*Example 3 (Bx lenses).* Examples of wb aa-lenses modelling a Bx can be found in [11]: they all can be considered as ala-lenses with a trivial parameter space **1**.

*Example 4 (Learners).* Learners defined in [17] are codiscretely learning codiscrete lenses with amendment, and as such satisfy (the amended) Putget (Remark 1). Looking at the opposite direction, ala-lenses are a categorification of learners as detailed in Fig. 8 on p. 194.

## 4 Compositionality of ala-lenses

This section explores the compositional structure of the universe of ala-lenses; especially interesting is their sequential composition. We will begin with a small example demonstrating sequential composition of ordinary lenses and showing that the notion of update policy transcends individual lenses. Then we define sequential and parallel composition of ala-lenses (the former is much more involved than for ordinary lenses) and show that wb ala-lenses can be organized into an sm-category. Finally, we formalize the notion of a compositional update policy via the notion of a compositional bidirectional language.

### 4.1 Compositionality of update policies: An example

Fig. 3 extends the example in Fig. 1 with a new model space **C** whose schema consists of the only attribute Name, and a view of the IT-view, in which only employees of the ML department are to be shown. Thus, we now have two functors, get1: **A** → **B** and get2: **B** → **C**, and their composition Get: **A** → **C** (referred to as the *long* get). The top part of Fig. 3 shows how it works for model A considered above.

Each of the two policies, policy qt (green) and policy tr (orange), in which person's disappearance from the view are interpreted, resp., as quiting the company and transitioning to a department not included into the view, is applicable to the new view mappings get2 and Get, thus giving us six lenses shown in Fig. 4 with solid arrows; amongst them, lenses, Lqt and Ltr are obtained by applying policy *pol* to the (long) functor Get;, and we will refer to them *long* lenses. In addition, we can compose lenses of the same colour as shown in Fig. 4 by dashed arrows (and we can also compose lenses of different colours (- qt <sup>1</sup> with tr <sup>2</sup> and tr 1 with - qt <sup>2</sup> ) but we do not need them). Now an important question is how long and composed lenses are related: whether <sup>L</sup>*pol* and - *pol* <sup>1</sup> ; - *pol* <sup>2</sup> for *pol* ∈ {qt,tr}, are equal (perhaps up to some equivalence) or different?

Fig. 3: Example cont'd: functoriality of update policies

Fig. 3 demonstrates how the mechanisms work with a simple example. We begin with an update w of the view C that says that Mary #M left the ML department, and a new Mary #M was hired for ML. Policy qt interprets Mary's disappearance as quiting the company, and hence this Mary doesn't appear in view Bqt produced by put2qt nor in view Aqt <sup>12</sup> produced from <sup>B</sup>qt by put1qt, and updates vqt and uqt <sup>12</sup> are written accordingly. Obviously, Mary also does not appear in view A- qt produced by the long lens's Putqt. Thus, put1qt <sup>A</sup> (put2qt <sup>A</sup> (w)) =

Fig. 4: Lens combination schemas for Fig. 3

Putqt <sup>A</sup> (w), and it is easy to understand that such equality will hold for any source model A and any update w: C <sup>→</sup> C due to the nature of our two views get1 and get2. Hence, <sup>L</sup>qt <sup>=</sup> - qt 1 ; - qt <sup>2</sup> where <sup>L</sup>qt = (Get, Putqt) and - qt <sup>i</sup> = (geti, put<sup>i</sup> qt).

The situation with policy tr is more interesting. Model A- tr <sup>12</sup> produced by the composed lens tr 1 ; tr <sup>2</sup> , and model <sup>A</sup>- tr produced by the long lens <sup>L</sup>tr = (Get, Puttr) are different as shown in the figure (notice the two different values for Mary's department framed with red ovals in the models). Indeed, the composed lens has more information about the old employee Mary—it knows that Mary was in the IT view, and hence can propagate the update more accurately. The comparison update δtr A,w: Atr <sup>→</sup> Atr <sup>12</sup> adds this missing information so that equality utr; δtr A,w <sup>=</sup> <sup>u</sup>tr <sup>12</sup> holds. This is a general phenomenon: functor composition looses information and, in general, functor Get = get1; get2 knows less than the pair (get1, get2). Hence, operation Put back-propagating updates over Get (we will also say *inverting* Get) will, in general, result in less certain models than composition put1 ◦ put2 that inverts the composition get1; get2 (a discussion and examples of this phenomenon in the context of vertical composition of updates can be found in [8]). Hence, comparison updates such as δtr A,w should exist for any A and any w: A.Get <sup>→</sup> C- , and together they should give rise to something like a natural transformation between lenses, δtr **<sup>A</sup>**,**B**,**<sup>C</sup>** : <sup>L</sup>tr <sup>⇒</sup> tr 1 ; tr <sup>2</sup> . To make this notion precise, we need a notion of natural transformation between "functors" put, which we leave for future work. In the present paper, we will consider policies like qt, for which strict equality holds.

### 4.2 Sequential composition of ala-lenses

Let *<sup>k</sup>* : **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** and -: **B** → **C** be two ala-lenses with parameterized functors get*<sup>k</sup>* : **<sup>P</sup>** <sup>→</sup> [**A**, **<sup>B</sup>**] and get-: **<sup>Q</sup>** <sup>→</sup> [**B**, **<sup>C</sup>**] resp. Their *composition* is the following ala-lens *<sup>k</sup>* ; -. Its parameter space is the product **P** × **Q**, and the get-family is defined as follows. For any pair of parameters (p, q) (we will write pq), get*<sup>k</sup>* ;- pq = get*<sup>k</sup>* <sup>p</sup>; get- <sup>q</sup>: **<sup>A</sup>** <sup>→</sup> **<sup>C</sup>**. Given a pair of parameter deltas, e: p <sup>→</sup> p in **<sup>P</sup>** and h: q <sup>→</sup> q- in **Q**, their get*<sup>k</sup>* ;--image is the Godement product ∗ of natural transformations, get*<sup>k</sup>* ;-(eh) = get*<sup>k</sup>* (e) <sup>∗</sup> get-(h) ( we will also write get*<sup>k</sup>* <sup>e</sup> || get- h)

Fig. 5: Sequential composition of apa-lenses

Now we define *<sup>k</sup>* ; -'s propagation operations puts. Let (A, pq, Apq) with <sup>A</sup> <sup>∈</sup> **<sup>A</sup>**0, pq <sup>∈</sup> (**P**×**Q**)0, A.get*<sup>k</sup>* <sup>p</sup>.get- <sup>q</sup> <sup>=</sup> <sup>A</sup>pq <sup>∈</sup> **<sup>C</sup>**<sup>0</sup> be a state of lens *<sup>k</sup>* ; -, and <sup>w</sup>: <sup>A</sup>pq <sup>→</sup> C is a target update as shown in Fig. 3. For the first propagation step, we run lens as shown in Fig. 3 with the blue colour for derived elements: this is just an instantiation of the pattern of Fig. 2 with the source object being <sup>A</sup><sup>p</sup> <sup>=</sup> A.get<sup>p</sup> and parameter q. The results are deltas

$$\begin{cases} (3) \\ h = \mathsf{put}\_{q, A\_p}^{\ell, \mathsf{update}}(w) \colon q \to q', v = \mathsf{put}\_{q, A\_p}^{\ell, \mathsf{recq}}(w) \colon A\_p \to B', w^{\circledast} = \mathsf{put}\_{q, A\_p}^{\ell, \mathsf{self}}(w) \colon C' \to B'\_{q'}. \end{cases}$$

Next we run lens *<sup>k</sup>* at state (p, A) and the target update v produced by lens -; it is yet another instantiation of pattern in Fig. 2 (this time with the green colour for derived elements), which produces three deltas (4)

$$\overset{\circ}{e} = \mathsf{put}\_{p,A}^{\pounds, \mathsf{upd}}(v) \colon p \to p', u = \mathsf{put}\_{p,A}^{\pounds, \mathsf{req}}(v) \colon A \to A', v^{\otimes} = \mathsf{put}\_{p,A}^{\pounds, \mathsf{reff}}(v) \colon B' \to A'\_{p'}. \mathsf{Q}$$

These data specify the green prism adjoint to the blue prism: the edge v of the latter is the "first half" of the right back face diagonal A<sup>p</sup>A- p of the former. In order to make an instance of the pattern in Fig. 2 for lens *<sup>k</sup>* ; -, we need to extend the blue-green diagram to a triangle prism by filling-in the corresponding "empty space". These filling-in arrows are provided by functors get and get*<sup>k</sup>* and shown in orange (where we have chosen one of the two equivalent ways of forming the Godement product – note two curve brown arrows). In this way we obtain yet another instantiation of the pattern in Fig. 2 denoted by *<sup>k</sup>* ; -:

$$\mathfrak{p}(5) \quad \mathfrak{p}\mathfrak{ut}\_{A,pq}^{(\mathfrak{k};\ell)\mathtt{update}}(w) = (e,h), \quad \mathfrak{p}\mathfrak{ut}\_{A,pq}^{(\mathfrak{k};\ell)\mathtt{re}}(w) = u, \quad \mathfrak{p}\mathfrak{ut}\_{A,pq}^{(\mathfrak{k};\ell)\mathtt{self}}(w) = w^{\otimes}; v\_{q^{\otimes}}^{\otimes}$$

where v@ q denotes <sup>v</sup>@.get<sup>q</sup>- . Thus, we built an ala-lens *<sup>k</sup>* ; -, which satisfies equation Putget<sup>0</sup> by construction.

Theorem 1 (Sequential composition and lens laws). *Given ala-lenses <sup>k</sup>* : **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** *and* -: **<sup>B</sup>** <sup>→</sup> **<sup>C</sup>***, let lens <sup>k</sup>* ; -: **A** → **C** *be their sequential composition as defined above. Then the lens <sup>k</sup>* ; *is wb as soon as lenses <sup>k</sup> and are such.*

See [9, Appendix A.3] for a proof.

#### 4.3 Parallel composition of ala-lenses

Let <sup>i</sup>: **<sup>A</sup>**<sup>i</sup> <sup>→</sup> **<sup>B</sup>**i, <sup>i</sup> = 1, <sup>2</sup> be two ala-lenses with parameter spaces **<sup>P</sup>**i. The lens -1||-<sup>2</sup>: **<sup>A</sup>**1×**A**<sup>2</sup> <sup>→</sup> **<sup>B</sup>**1×**B**<sup>2</sup> is defined as follows. Parameter space -1||-<sup>2</sup>.**<sup>P</sup>** <sup>=</sup> **<sup>P</sup>**<sup>1</sup> <sup>×</sup> **<sup>P</sup>**2. For any pair <sup>p</sup><sup>1</sup>||p<sup>2</sup> <sup>∈</sup> (**P**1×**P**2)0, define get-1||-2 <sup>p</sup>1||p<sup>2</sup> <sup>=</sup> get-1 <sup>p</sup><sup>1</sup> <sup>×</sup> get-2 <sup>p</sup><sup>2</sup> (we denote pairs of parameters by <sup>p</sup><sup>1</sup>||p<sup>2</sup> rather than <sup>p</sup><sup>1</sup> <sup>⊗</sup> <sup>p</sup><sup>2</sup> to shorten long formulas going beyond the page width). Further, for any pair of models <sup>A</sup><sup>1</sup>||A<sup>2</sup> <sup>∈</sup> (**A**<sup>1</sup> <sup>×</sup> **<sup>A</sup>**2)<sup>0</sup> and deltas v<sup>1</sup>||v<sup>2</sup>: (A<sup>1</sup>||A<sup>2</sup>).get-1||-2 <sup>p</sup>1||p<sup>2</sup> <sup>→</sup> <sup>B</sup>- <sup>1</sup>||B- <sup>2</sup>, we define componentwise

$$e = \mathsf{put}\_{p\_1}^{(\ell\_1 \| \ell\_2) \mathsf{update}}\_{p\_1 \| p\_2, A\_1 \| A\_2} (v\_1 \| v\_2) ; \, p\_1 \| p\_2 \to p'\_1 \| p'\_2$$

by setting <sup>e</sup> <sup>=</sup> <sup>e</sup><sup>1</sup>||e<sup>2</sup> where <sup>e</sup><sup>i</sup> <sup>=</sup> puti pi,S<sup>i</sup> (v<sup>i</sup>), i = 1, <sup>2</sup> and similarly for put(-1||-<sup>2</sup>)req p1||p2,A1||A<sup>2</sup> and put(-1||-<sup>2</sup>)self <sup>p</sup>1||p2,A1||A<sup>2</sup> The following result is obvious.

Theorem 2 (Parallel composition and lens laws). *Lens* -1||-<sup>2</sup> *is wb as soon as lenses* -<sup>1</sup> *and* -<sup>2</sup> *are such.*

## 4.4 Symmetric monoidal structure over ala-lenses

Our goal is to organize ala-lenses into an sm-category. To make sequential composition of ala-lenses associative, we need to consider them up to some equivalence (indeed, Cartesian product is not strictly associative).

Definition 5 (Ala-lens Equivalence) Two parallel ala-lenses -, ˆ-: **A** → **B** are called *equivalent* if their parameter spaces are isomorphic via a functor ι: **<sup>P</sup>** <sup>→</sup> **<sup>P</sup>**<sup>ˆ</sup> such that for any A <sup>∈</sup> **<sup>A</sup>**0, <sup>e</sup>: <sup>p</sup> <sup>→</sup> <sup>p</sup>- <sup>∈</sup> **<sup>P</sup>** and v: (A.getp) <sup>→</sup> T the following holds (for **<sup>x</sup>**∈{req,self}):

$$A. \mathbf{get}\_e = A. \widehat{\mathbf{get}}\_{\iota(e)}, \iota(\mathbf{put}\_{p,A}^{\mathsf{upd}}(v)) = \widehat{\mathsf{put}}\_{\iota(p),A}(v), \text{ and } \mathsf{put}\_{p,A}^{\mathbf{x}}(v) = \widehat{\mathsf{put}}\_{\iota(p),A}^{\mathbf{x}}(v).$$

*Remark 3.* It would be more categorical to require delta isomorphisms (i.e., commutative squares whose horizontal edges are isomorphisms) rather than equalities as above. However, model spaces appearing in Bx-practice are skeletal categories (and even stronger than skeletal in the sense that all isos, including iso loops, are identities), for which isos become equalities so that the generality would degenerate into equality anyway.

It is easy to see that operations of lens' sequential and parallel composition are compatible with lens' equivalence and hence are well-defined for equivalence classes. Below we identify lenses with their equivalence classes by default.

Theorem 3 (Ala-lenses form an sm-category). *Operations of sequential and parallel composition of ala-lenses defined above give rise to an sm-category* aLaLens*, whose objects are model spaces (= categories) and arrows are (equivalence classes of ) ala-lenses.* See [9, p.17 and Appendix A.4] for a proof.

## 4.5 Functoriality of learning in the *delta* lens setting

As example in Sect. 4.1 shows, the notion of update policy transcends individual lenses. Hence, its proper formalization needs considering the entire category of ala-lenses and functoriality of a suitable mapping.

### Definition 6 (Bx-transformation language)

A *compositional bidirectional model transformation language* Lbx is given by (i) an sm-category pGet(Lbx) whose objects are *(*Lbx*-)model spaces* and arrows are *(*Lbx*-)transformations* which is supplied with forgetful functor into pCat, and (ii) an sm-functor <sup>L</sup><sup>L</sup>bx : pGet(Lbx) <sup>→</sup> aLaLens such that the lower triangle in the inset diagram commutes. (Forgetful functors in this diagram are named "−X" with X referring to the structure to be forgotten.)

An <sup>L</sup>bx-language is *well-behaved (wb)* if functor <sup>L</sup><sup>L</sup>bx factorizes as shown by the upper triangle of the diagram.

*Example.* A major compositionality result of Fong *et al* [17] states the existence of an sm-functor from the category of Euclidean spaces and parameterized differentiable functions (pd-functions) P ara into the category Learn of learning algorithms (*learners*) as shown by the inset commutative diagram. (The functor

is itself parameterized by a *step size* <sup>0</sup> < ε <sup>∈</sup> <sup>R</sup> and an *error function* err: <sup>R</sup>×<sup>R</sup> <sup>→</sup> <sup>R</sup> needed to specify the gradient descent procedure.) However, learners are nothing but codiscrete ala-lenses (see Sect. A.2), and thus the inset diagram is a codiscrete specialization of the diagram in Def. 6 above. That is, the category of Euclidean spaces and pd-functions, and the gradient

descent method for back propagation, give rise to a (codiscrete) compositional bx-transformation language (over pSet rather than pCat).

Finding a specifically Bx instance of Def. 6 (e.g., checking whether it holds for concrete languages and tools such as eMoflon [23] or groundTram [22]) is laborious and left for future work.

## 5 Related work

Figure 6 on the right is a simplified version of Fig. 8 on p. 194 convenient for our discussion here: immediate related work should be found in areas located at points (0,1) (codiscrete learning lenses) and (1,0) (delta lenses) of the plane. For the point (0,1), the paper [17] by Fong, Spivak and Tuyéras is fundamental: they defined the notion of a codiscrete learning lens (called a learner), proved a fundamental results about sm-functoriality of the gradient descent approach to

ML, and thus laid a foundation for the compositional approach to change propagation with learning. One follow-up of that work is paper [16] by Fong and Johnson, in which they build an sm-functor Learn <sup>→</sup> sLens which maps learners to so called symmetric lenses. That paper is probably the first one where the terms 'lens' and 'learner' are met, but the initial observation that a learner whose parameter set is a singleton is actually a lens is due to Jules Hedges, see [16].

There are conceptual and technical distinctions between [16] and the present paper. On the conceptual level, by encoding learners as symmetric lenses, they "hide" learning inside the lens framework and make it a technical rather than conceptual idea. In contrast, we consider parameterization and supervised learning as a fundamental idea and a first-class citizen for the lens framework, which grants creation of a new species of lenses. Moreover, while an ordinary lens is a way to invert a functor, a learning lens is a way to invert a parameterized functor so that learning lenses appear as an extension of the parameterization idea from functors to lenses. (This approach can probably be specified formally by treating parameterization as a suitably defined functorial construction.) Besides technical advantages (working with asymmetric lenses is simpler), our asymmetric model seems more adequate to the problem of learning functions rather than relations. On the technical level, the lens framework we develop in the paper is much more general than in [16]: we categorificated both the parameter space and model spaces, and we work with lenses with amendment (which allows us to relax the Putget law if needed).

As for the delta lens roots (the point (1,0) in the figure), delta lenses were motivated and formally defined in [12] (the asymmetric case) and [13] (the symmetric one). Categorical foundations for the delta lens theory were developed by Johnson and Rosebrugh in a series of papers (see [20] for references); this line is continued in Clarke's work [6]. The notion of a delta lens with amendments (in both asymmetric and symmetric variants) was defined in [11], and several composition results were proved. Another extensive body of work within the delta-based area is modelling and implementing model transformations with triple-graph grammars (TGG) [4,23]. TGG provide an implementation framework for delta lenses as is shown and discussed in [5,19,2], and thus inevitably consider change propagation on a much more concrete level than lenses. The author is not aware of any work considering functoriality of update policies developed within the TGG framework.

The present paper is probably the first one at the intersection (1,1) of the plane. The preliminary results have recently been reported at ACT'19 in Oxford to a representative lens community, and no references besides [17], [16] mentioned above were provided.

## 6 Conclusion

The perspective on Bx presented in the paper is an example of a fruitful interaction between two domains—ML and Bx. In order to be ported to Bx, the compositional approach to ML developed in [17] is to be categorificated as shown in Fig. 8 on p. 194. This opens a whole new program for Bx: checking that currently existing Bx languages and tools are compositional (and well-behaved) in the sense of Def. 6 p. 190. The wb compositionality is an important practical requirement as it allows for modular design and testing of bidirectional transformations. Surprisingly, but this important requirement has been missing from the agenda of the Bx community, e.g., the recent endeavour of developing an effective benchmark for Bx-tools [3] does not discuss it.

In a wider context, the main message of the paper is that the learning idea transcends its applications in ML: it is applicable and usable in many domains in which lenses are applicable such as model transformations, data migration, and open games [18]. Moreover, the categorificated learning may perhaps find useful applications in ML itself. In the current ML setting, the object to be learnt is a function <sup>f</sup>: <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>R</sup><sup>n</sup> that, in the OO class modelling perspective, is a very simple structure: it can be seen as one object with a (huge) amount of attributes, or, perhaps, a predefined set of objects, which is not allowed to be changed during the search — only attribute values may be changed. In the delta lens view, such changes constitute a rather narrow class of updates and thus unjustifiably narrow the search space. Learning with the possibility to change dimensions m, n may be an appropriate option in several contexts. On the other hand, while categorification of model spaces extends the search space, categorification of the parameter space would narrow the search space as we are allowed to replace a parameter p by parameter p only if there is a suitable arrow e: p <sup>→</sup> p in category **P**. This narrowing may, perhaps, improve performance. All in all, the interaction between ML and Bx could be bidirectional!

## A Appendices

#### A.1 Category of parameterized functors *pCat*

Category pCat has all small categories as objects. pCat-arrows **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** are *parameterized* functors (*p-functors*) i.e., functors f: **<sup>P</sup>** <sup>→</sup> [**A**, **<sup>B</sup>**] with **<sup>P</sup>** a small category of *parameters* and [**A**, **<sup>B</sup>**] the category of functors from **<sup>A</sup>** to **<sup>B</sup>** and their natural transformations. For an object p and an arrow e: p <sup>→</sup> p in **P**, we write <sup>f</sup><sup>p</sup> for the functor <sup>f</sup>(p): **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** and <sup>f</sup><sup>e</sup> for the natural transformation <sup>f</sup>(e): <sup>f</sup><sup>p</sup> <sup>⇒</sup> <sup>f</sup><sup>p</sup>- . We will write p-functors as labelled arrows f: **<sup>A</sup> <sup>P</sup>** ✲ **<sup>B</sup>**. As Cat is Cartesian closed, we have a natural isomorphism between Cat(**P**, [**A**, **<sup>B</sup>**]) and Cat(**P**×**A**, **<sup>B</sup>**) and can reformulate the above definition in an equivalent way with functors **P**×**A** → **B**. We prefer the former formulation as it corresponds to the notation f: **<sup>A</sup> <sup>P</sup>** ✲ **B** visualizing **P** as a hidden state of the transformation, which seems adequate to the intuition of parameterized in our context. (If some technicalities may perhaps be easier to see with the product formulation, we will switch to the product view thus doing currying and uncurrying without special mentioning.) Sequential composition of of <sup>f</sup>: **<sup>A</sup> <sup>P</sup>**✲ **<sup>B</sup>** and g: **<sup>B</sup> <sup>Q</sup>**✲ **<sup>C</sup>** is f.g: **<sup>A</sup> <sup>P</sup>**×✲**<sup>Q</sup> <sup>C</sup>** given by (f.g)pq def <sup>=</sup> <sup>f</sup><sup>p</sup>.g<sup>q</sup> for objects, i.e., pairs <sup>p</sup>∈**P**, <sup>q</sup>∈**Q**, and by the Godement product of natural transformations for arrows in **P**×**Q**. That is, given a pair e: p <sup>→</sup> p in **<sup>P</sup>** and h: q <sup>→</sup> q in **Q**, we define the transformation (f.g)eh : <sup>f</sup><sup>p</sup>.g<sup>q</sup> <sup>⇒</sup> <sup>f</sup><sup>p</sup>- .gqto be the Godement product <sup>f</sup><sup>e</sup> <sup>∗</sup> <sup>g</sup><sup>h</sup>.

Any category **<sup>A</sup>** gives rise to a p-functor Id**A**: **<sup>A</sup> <sup>1</sup>**✲ **<sup>A</sup>**, whose parameter space is a singleton category **1** with the only object ∗, Id**A**(∗) = id**<sup>A</sup>** and IdA(id∗): id**<sup>A</sup>** ⇒ id**<sup>A</sup>** is the identity transformation. It's easy to see that p-functors Id\_ are units of the sequential composition. To ensure associativity we need to consider p-functors up to an equivalence of their parameter spaces. Two parallel p-functors <sup>f</sup>: **<sup>A</sup> <sup>P</sup>**✲ **<sup>B</sup>** and <sup>ˆ</sup>f: **<sup>A</sup> <sup>P</sup>**<sup>ˆ</sup> ✲ **B**, are *equivalent* if there is an isomorphism α: **<sup>P</sup>** <sup>→</sup> **<sup>P</sup>**<sup>ˆ</sup> such that two parallel functors f: **<sup>P</sup>** <sup>→</sup> [**A**, **<sup>B</sup>**] and α; <sup>ˆ</sup>f: **<sup>P</sup>** <sup>→</sup> [**A**, **<sup>B</sup>**] are naturally isomorphic; then we write <sup>f</sup> <sup>≈</sup><sup>α</sup> <sup>ˆ</sup>f. It's easy to see that if <sup>f</sup> <sup>≈</sup><sup>α</sup> <sup>ˆ</sup>f: **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** and <sup>g</sup> <sup>≈</sup><sup>β</sup> <sup>g</sup>ˆ: **<sup>B</sup>** <sup>→</sup> **<sup>C</sup>**, then <sup>f</sup>; <sup>g</sup> <sup>≈</sup><sup>α</sup>×<sup>β</sup> <sup>ˆ</sup>f; ˆg: **<sup>A</sup>** <sup>→</sup> **<sup>C</sup>**, i.e., sequential composition is stable under equivalence. Below we will identify p-functors and their equivalence classes. Using a natural isomorphism (**P**×**Q**)×**R** ∼= **P**×(**Q**×**R**), strict associativity of the functor composition and strict associativity of the Godement product, we conclude that sequential composition of (equivalence classes of) p-functors is strictly associative. Hence, pCat is a category.

Our next goal is to supply it with a monoidal structure. We borrow the latter from the smcategory (Cat,×), whose tensor is given by the product. There is an identical on objects embedding (Cat,×) ✲ ✲ pCat that maps a functor f: **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** to a p-functor ¯f: **<sup>A</sup> <sup>1</sup>** ✲ **B** whose parameter space

pCat ✛ ✛pSet (Cat,×) ✻ ✻ ✛✛(Set,×) ✻ ✻ Fig. 7

is the singleton category **1**. Moreover, as this embedding is a functor, the coherence equations for the associators and unitors that hold in (Cat,×) hold in pCat as well (this proof idea is borrowed from [17]). In this way, pCat becomes an sm-category. In a similar way, we define the sm-category pSet of small sets and parametrized functions between them — the codiscrete version of pCat. The diagram in Fig. 7 shows how these categories are related.

### A.2 Ala-lenses as categorification of ML-learners

Figure 8 shows a discrete two-dimensional plane with each axis having three points: a space is a singleton, a set, a category encoded by coordinates 0,1,2 resp. Each of the points <sup>x</sup>ij is then the location of a corresponding sm-category of

Fig. 8: The universe of categories of learning delta lenses

(asymmetric) learning (delta) lenses. Category {*1*} is a terminal category whose only arrow is the identity lens *<sup>1</sup>* = (id**<sup>1</sup>**, id**1**): **<sup>1</sup>** <sup>→</sup> **<sup>1</sup>** propagating from a terminal category **1** to itself. Label ∗ refers to the codiscrete specialization of the construct being labelled: L<sup>∗</sup> means codiscrete learning (i.e., the parameter space **<sup>P</sup>** is a set considered as a codiscrete category) and aLens<sup>∗</sup> refers to codiscrete model spaces. The category of learners defined in [17] is located at point (1,1), and the category of learning delta lenses with amendments defined in the present paper is located at (2,2). There are also two semi-categorificated species of learning lenses: categorical learners at point (1,2) and codiscretely learning delta lenses at (2,1), which are special cases of ala-lenses.

## References


on Theory and Practice of Software, Bx@ETAPS 2017, Uppsala, Sweden, April 29, 2017, CEUR Workshop Proceedings, vol. 1827. CEUR-WS.org (2017), http: //ceur-ws.org/Vol-1827


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended

#### Non-idempotent intersection types in logical form*-*

Thomas Ehrhard [-]

Université de Paris, IRIF, CNRS, F-75013 Paris, France ehrhard@irif.fr https://www.irif.fr/ ehrhard/

Abstract. Intersection types are an essential tool in the analysis of operational and denotational properties of lambda-terms and functional programs. Among them, non-idempotent intersection types provide precise quantitative information about the evaluation of terms and programs. However, unlike simple or second-order types, intersection types cannot be considered as a logical system because the application rule (or the intersection rule, depending on the presentation of the system) involves a condition stipulating that the proofs of premises must have the same structure. Using earlier work introducing an indexed version of Linear Logic, we show that non-idempotent typing can be given a logical form in a system where formulas represent hereditarily indexed families of intersection types.

Keywords: Lambda Calculus · Denotational Semantics · Intersection Types · Linear Logic

## Introduction

Intersection types, introduced in the work of Coppo and Dezani [4,5] and developed since then by many authors, are still a very active research topic. As quite clearly explained in [13], the Coppo and Dezani intersection type system DΩ can be understood as a syntactic presentation of the denotational interpretation of λ-terms in the Engeler's model, which is a model of the pure λ-calculus in the cartesian closed category of prime-algebraic complete lattices and Scott continuous functions.

Intersection types can be considered as formulas of the propositional calculus with implication ⇒ and conjunction ∧ as connectives. However, as pointed out by Hindley [12], intersection types deduction rules depart drastically from the standard logical rules of intuitionistic logic (and of any standard logical system) by the fact that, in the ∧-introduction rule, it is assumed that the proofs of the two premises are typings of the same λ-term, which means that, in some sense made precise by the typing system itself, they have the same structure. Such requirements on proofs premises, and not only on formulas proven in premises,

c The Author(s) 2020

<sup>-</sup>Partially supported by the project ANR-19-CE48-0014 PPS.

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 198–216, 2020. https://doi.org/10.1007/978-3-030-45231-5\_11

are absent from standard (intuitionistic or classical) logical systems where the proofs of premises are completely independent from each other. Many authors have addressed this issue, we refer to [14] for a discussion on several solutions which mainly focus on the design of à la Church presentations of intersection typing systems, thus enriching λ-terms with additional structures. Among the most recent and convincing contributions to this line of research we should certainly mention [15].

In our "new" approach to this problem — not so new actually since it dates back to [3] —, we change formulas instead of changing terms. It is based on a specific model of Linear Logic (and thus of the λ-calculus): the relational model. It is fair to credit Girard for the introduction of this model since it appears at least implicitly in [11]. It was probably known by many people in the Linear Logic community as a piece of folklore since the early 1990's and is presented formally in [3]. In this quite simple and canonical denotational model, types are interpreted as sets (without any additional structure) and a closed term of type σ is interpreted as a subset of the interpretation of σ. It is quite easy to define, in this semantic framework, analogues of the usual models of the pure <sup>λ</sup>-calculus such as Scott's <sup>D</sup><sup>∞</sup> or Engeler's model, which in some sense are simpler than the original ones since the sets interpreting types need not to be pre-ordered. As explained in the work of De Carvalho [6,7], the intersection type counterpart of this semantics is a typing system where "intersection" is nonidempotent (in sharp contrast with the original systems introduced by Coppo and Dezani), sometimes called system R. Notice that the precise connection between the idempotent and non-idempotent approaches is analyzed in [8], in a quite general Linear Logic setting by means of an extensional collapse.

In order to explain our approach, we restrict first to simple types, interpreted as follows in the relational model: a basic type α is interpreted as a given set α and the type σ <sup>⇒</sup> τ is interpreted as the set <sup>M</sup>fin(σ) <sup>×</sup> τ (where <sup>M</sup>fin(E) is the set of finite multisets of elements of E). Remember indeed that intersection types can be considered as a syntactic presentation of denotational semantics, so it makes sense to define intersection types relative to simple types (in the spirit of [10]) as we do in Section 3: an intersection type relative to the base type α is an element of <sup>α</sup> and an intersection type relative to <sup>σ</sup> <sup>⇒</sup> <sup>τ</sup> is a pair ([ <sup>a</sup><sup>1</sup>,...,a<sup>n</sup> ], b) where the a<sup>i</sup>s are intersection types relative to <sup>σ</sup> and <sup>b</sup> is an intersection type relative to <sup>τ</sup> ; with more usual notations<sup>1</sup> ([ <sup>a</sup><sup>1</sup>,...,a<sup>n</sup> ], b) would be written (a<sup>1</sup> <sup>∧</sup> ··· ∧ a<sup>n</sup>) <sup>→</sup> <sup>b</sup>. Then, given a type <sup>σ</sup>, the main idea consists in representing an indexed family of elements of σ as a formula of a new logical system. If <sup>σ</sup> = (<sup>ϕ</sup> <sup>⇒</sup> <sup>ψ</sup>) then the family can be written<sup>2</sup> ([ <sup>a</sup><sup>k</sup> <sup>|</sup> <sup>k</sup> <sup>∈</sup> <sup>K</sup> and <sup>u</sup>(k) = <sup>j</sup> ], b<sup>j</sup> )<sup>j</sup>∈<sup>J</sup> where J and K are indexing sets, u : K <sup>→</sup> J is a function such that f <sup>−</sup><sup>1</sup>({j}) is finite for all <sup>j</sup> <sup>∈</sup> <sup>J</sup>, (b<sup>j</sup> )<sup>j</sup>∈<sup>J</sup> is a family of elements of ψ (represented by a formula <sup>B</sup>) and (a<sup>k</sup>)<sup>k</sup>∈<sup>K</sup> is a family of elements of ϕ (represented by a formula A): in that case we introduce the implicative formula (<sup>A</sup> <sup>⇒</sup><sup>u</sup> <sup>B</sup>) to represent the family

<sup>1</sup> That we prefer not to use for avoiding confusions between these two levels of typing.

<sup>2</sup> We use [ ··· ] for denoting multisets much as one uses {· · · } for denoting sets, the only difference is that multiplicities are taken into account.

([ <sup>a</sup><sup>k</sup> <sup>|</sup> <sup>k</sup> <sup>∈</sup> <sup>K</sup> and <sup>u</sup>(k) = <sup>j</sup> ], b<sup>j</sup> )j∈<sup>J</sup> . It is clear that a family of simple types has generally infinitely many representations as such formulas; this huge redundancy makes it possible to establish a tight link between inhabitation of intersection types with provability of formulas representing them (in an indexed version LJ(I) of intuitionistic logic). Such a correspondence is exhibited in Section 3 in the simply typed setting and the idea is quite simple:

given a type <sup>σ</sup>, a family (a<sup>j</sup> )<sup>j</sup>∈<sup>J</sup> of elements of σ, and a closed λ-term of type <sup>σ</sup>, it is equivalent to say that <sup>M</sup> : <sup>a</sup><sup>j</sup> holds for all <sup>j</sup> and to say that some (and actually any) formula <sup>A</sup> representing (a<sup>j</sup> )<sup>j</sup>∈<sup>J</sup> has an LJ(I) proof<sup>3</sup> whose underlying <sup>λ</sup>-term is <sup>M</sup>.

In Section <sup>4</sup> we extend this approach to the untyped λ-calculus taking as underlying model of the pure <sup>λ</sup>-calculus our relational version <sup>R</sup><sup>∞</sup> of Scott's <sup>D</sup><sup>∞</sup>. We define an adapted version of LJ(I) and establish a similar correspondence, with some slight modifications due to the specificities of R∞.

## 1 Notations and preliminary definitions

If E is a set, a finite multiset of elements of E is a function m : E <sup>→</sup> <sup>N</sup> such that the set {a <sup>∈</sup> E <sup>|</sup> m(a) = 0} (called the domain of m) is finite. The cardinal of such a multiset m is #m <sup>=</sup> - <sup>a</sup>∈<sup>E</sup> <sup>m</sup>(a). We use <sup>+</sup> for the obvious addition operation on multisets, and if <sup>a</sup>1,...,a<sup>n</sup> are elements of <sup>E</sup>, we use [ <sup>a</sup>1,...,a<sup>n</sup> ] for the corresponding multiset (taking multiplicities into account); for instance [ 0, <sup>1</sup>, <sup>0</sup>, <sup>2</sup>, 1 ] is the multiset m of elements of <sup>N</sup> such that m(0) = 2, m(1) = 2, <sup>m</sup>(2) = 1 and <sup>m</sup>(i)=0 for i > <sup>2</sup>. If (a<sup>i</sup>)<sup>i</sup>∈<sup>I</sup> is a family of elements of <sup>E</sup> and if <sup>J</sup> is a finite subset of <sup>I</sup>, we use [ <sup>a</sup><sup>i</sup> <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>J</sup> ] for the multiset of elements of <sup>E</sup> which maps <sup>a</sup> <sup>∈</sup> <sup>E</sup> to the number of elements <sup>i</sup> <sup>∈</sup> <sup>J</sup> such that <sup>a</sup><sup>i</sup> <sup>=</sup> <sup>a</sup> (which is finite since J is). We use <sup>M</sup>fin(E) for the set of finite multisets of elements of <sup>E</sup>.

We use + to denote set union when we we want to stress the fact that the involved sets are disjoint. A function u : J <sup>→</sup> K is almost injective if #u<sup>−</sup><sup>1</sup>{k} is finite for each k <sup>∈</sup> K (equivalently, the inverse image of any finite subset of <sup>K</sup> under <sup>u</sup> is finite). If <sup>s</sup> = (a<sup>1</sup>,...,a<sup>n</sup>) is a sequence of elements of E and i ∈ {1,...,n}, we use (s) \ i for the sequence (a<sup>1</sup>,...,a<sup>i</sup>−<sup>1</sup>, a<sup>i</sup>+1,...,a<sup>n</sup>). Given sets E and F, we use F <sup>E</sup> for the set of function from <sup>E</sup> to <sup>F</sup>. The elements of F <sup>E</sup> are sometimes considered as functions <sup>u</sup> (with a functional notation <sup>u</sup>(e) for application) and sometimes as indexed families <sup>a</sup> (with index notations <sup>a</sup><sup>e</sup> for application) especially when E is countable.

If i ∈ {1,...,n} and j ∈ {1,...,n <sup>−</sup> <sup>1</sup>}, we define <sup>s</sup>(j, i) ∈ {1,...,n} as follows: <sup>s</sup>(j, i) = j if j<i and <sup>s</sup>(j, i) = j + 1 if j <sup>≥</sup> i.

<sup>3</sup> Any such proof can be stripped from its indexing data giving rise to a proof of σ in intuitionistic logic.

## 2 The relational model of the *λ*-calculus

Let **Rel**! the category whose objects are sets<sup>4</sup> and **Rel**!(X, Y ) = <sup>P</sup>(Mfin(X) <sup>×</sup> <sup>Y</sup> ) with Id<sup>X</sup> <sup>=</sup> {([ <sup>a</sup> ], a) <sup>|</sup> <sup>a</sup> <sup>∈</sup> <sup>X</sup>} and composition of <sup>s</sup> <sup>∈</sup> **Rel**!(X, Y ) and <sup>t</sup> <sup>∈</sup> **Rel**!(Y,Z) given by

$$\begin{aligned} t \circ s &= \left\{ (m\_1 + \dots + m\_k, c) \mid \\ &\exists b\_1, \dots, b\_k \in Y \ ([b\_1, \dots, b\_k], c) \in t \text{ and } \forall j \,(m\_j, b\_j) \in s \right\} \ . \end{aligned}$$

It is easily checked that this composition law is associative and that Id is neutral for composition<sup>5</sup>. This category has all countable products: let (X<sup>j</sup> )<sup>j</sup>∈<sup>J</sup> be a countable family of sets, their product is <sup>X</sup> <sup>=</sup> &<sup>j</sup>∈<sup>J</sup> <sup>X</sup><sup>j</sup> <sup>=</sup> <sup>j</sup>∈<sup>J</sup> {j} × <sup>X</sup><sup>j</sup> and projections (pr<sup>j</sup> )<sup>j</sup>∈<sup>J</sup> given by pr<sup>j</sup> <sup>=</sup> {([ (j, a) ], a) <sup>|</sup> <sup>a</sup> <sup>∈</sup> <sup>X</sup><sup>j</sup>} ∈ **Rel**!(X, X<sup>j</sup> ) and if (s<sup>j</sup> )<sup>j</sup>∈<sup>J</sup> is a family of morphisms <sup>s</sup><sup>j</sup> <sup>∈</sup> **Rel**!(Y,X<sup>j</sup> ) then their tupling is <sup>s</sup><sup>j</sup> <sup>j</sup>∈<sup>J</sup> <sup>=</sup> {([ a ],(j, b))) <sup>|</sup> j <sup>∈</sup> J and ([ a ], b) <sup>∈</sup> s<sup>j</sup>} ∈ **Rel**!(Y,X).

The category **Rel**! is cartesian closed with object of morphisms from <sup>X</sup> to <sup>Y</sup> the set (X <sup>⇒</sup> Y ) = <sup>M</sup>fin(X)×<sup>Y</sup> and evaluation morphism Ev <sup>∈</sup> **Rel**!((<sup>X</sup> <sup>⇒</sup> <sup>Y</sup> ) & X, Y ) is given by Ev <sup>=</sup> {([ (1, [ <sup>a</sup>1,...,a<sup>k</sup> ], b),(2, a1),...,(2, a<sup>k</sup>) ], b) <sup>|</sup> <sup>a</sup>1,...,a<sup>k</sup> <sup>∈</sup> X and b <sup>∈</sup> Y }. The transpose (or curryfication) of s <sup>∈</sup> **Rel**!(<sup>Z</sup> & X, Y ) is Cur(s) <sup>∈</sup> **Rel**!(Z, X <sup>⇒</sup> <sup>Y</sup> ) given by Cur(s) = {([ <sup>c</sup>1,...,c<sup>n</sup> ],([ <sup>a</sup>1,...,a<sup>k</sup> ], b)) <sup>|</sup> ([ (1, c1),...,(1, c<sup>n</sup>),(2, a1),...,(2, a<sup>k</sup>) ], c) <sup>∈</sup> s}.

Relational *<sup>D</sup>∞*. Let <sup>R</sup><sup>∞</sup> be the least set such that (m0, m1,...) <sup>∈</sup> <sup>R</sup><sup>∞</sup> as soon as <sup>m</sup>0, m<sup>1</sup> ... are finite multisets of elements of <sup>R</sup><sup>∞</sup> which are almost all equal to [ ]. Notice in particular that <sup>e</sup> = ([ ], [ ],...) <sup>∈</sup> <sup>R</sup><sup>∞</sup> and satisfies <sup>e</sup> = ([ ], <sup>e</sup>). By construction we have R<sup>∞</sup> = Mfin(R∞) × R∞, that is R<sup>∞</sup> = (R<sup>∞</sup> ⇒ R∞) and hence <sup>R</sup><sup>∞</sup> is a model of the pure <sup>λ</sup>-calculus in **Rel**! which also satisfies the η-rule. See [1] for general facts on this kind of model.

## 3 The simply typed case

We assume to be given a set of type atoms α, β, . . . and of variables x, y, . . . ; types and terms are given as usual by σ, τ, . . . := α <sup>|</sup> σ <sup>⇒</sup> τ and M,N,... := x <sup>|</sup> (M) N <sup>|</sup> λx<sup>σ</sup> <sup>N</sup>.

With any type atom we associate a set α. This interpretation is extended to all types by σ <sup>⇒</sup> τ <sup>=</sup> σ <sup>⇒</sup> τ <sup>=</sup> <sup>M</sup>fin(σ)×τ . The relational semantics of this λ-calculus can be described as a non-idempotent intersection type system, with judgments of shape <sup>x</sup><sup>1</sup> : <sup>m</sup><sup>1</sup> : <sup>σ</sup><sup>1</sup>,...,x<sup>n</sup> : <sup>m</sup><sup>n</sup> : <sup>σ</sup><sup>n</sup> <sup>M</sup> : <sup>a</sup> : <sup>σ</sup> where the <sup>x</sup><sup>i</sup>'s are pairwise distinct variables, M is a term, a <sup>∈</sup> <sup>σ</sup> and <sup>m</sup><sup>i</sup> ∈ Mfin(σ<sup>i</sup>) for each i. Here are the typing rules:

$$\begin{array}{c} \begin{array}{l} j \neq i \Rightarrow m\_{j} = \left[ \right] \text{ and } m\_{i} = \left[ a \right] \end{array} \\ \begin{array}{l} \begin{array}{l} (x\_{i}:m\_{i}:\sigma\_{i})\_{i=1}^{n} \vdash x\_{i}:a:\sigma \\ \end{array} \end{array} \quad \begin{array}{l} \Phi,x:m:\sigma \vdash M:b:\tau \\ \hline \Phi \vdash \lambda x^{\sigma} \, M:(m,b):\sigma \Rightarrow \tau \end{array} \end{array}$$

<sup>4</sup> We can restrict to countable sets.

<sup>5</sup> This results from the fact that **Rel**! arises as the Kleisli category of the LL model of sets and relations, see [3] for instance.

$$\frac{\Phi \vdash M : ([\![a\_1, \ldots, a\_k\!], b\!) : \sigma \Rightarrow \tau \qquad (\Phi\_l \vdash N : a\_l : \sigma)\_{l=1}^k)}{\Psi \vdash (M) \, N : b : \tau}$$

where <sup>Φ</sup> = (x<sup>i</sup> : <sup>m</sup><sup>i</sup> : <sup>σ</sup>i)<sup>n</sup> <sup>i</sup>=1, <sup>Φ</sup><sup>l</sup> = (x<sup>i</sup> : <sup>m</sup><sup>l</sup> <sup>i</sup> : <sup>σ</sup>i)<sup>n</sup> <sup>i</sup>=1 for <sup>l</sup> = 1,...,k and <sup>Ψ</sup> = (x<sup>i</sup> : <sup>m</sup><sup>i</sup> <sup>+</sup> k <sup>l</sup>=1 <sup>m</sup><sup>l</sup> <sup>i</sup> : <sup>σ</sup>i)<sup>n</sup> <sup>i</sup>=1.

#### 3.1 Why do we need another system?

The trouble with this deduction system is that it cannot be considered as the term decorated version of an underlying "logical system for intersection types" allowing to prove sequents of shape <sup>m</sup><sup>1</sup> : <sup>σ</sup>1,...,m<sup>n</sup> : <sup>σ</sup><sup>n</sup> <sup>a</sup> : <sup>σ</sup> (where nonidempotent intersection types <sup>m</sup><sup>i</sup> and <sup>a</sup> are considered as logical formulas, the ordinary types <sup>σ</sup><sup>i</sup> playing the role of "kinds") because, in the application rule above, it is required that all the proofs of the k right hand side premises have the same shape given by the λ-term N. We propose now a "logical system" derived from [3] which, in some sense, solves this issue. The main idea is quite simple and relies on three principles: (1) replace hereditarily multisets with indexed families in intersection types, (2) instead of proving single types, prove indexed families of hereditarily indexed types and (3) represent syntactically such families (of hereditarily indexed types) as formulas of a new system of indexed logic.

### 3.2 Minimal LJ**(***I***)**

We define now the syntax of indexed formulas. Assume to be given an infinite countable set I of indices. Then we define indexed types A; with each such type we associate an underlying type <sup>A</sup>, a set <sup>d</sup>(A) and a family A<sup>∈</sup> -Ad(A) . These formulas are given by the following inductive definition:


Proposition 1. Let σ be a type, J be a subset of I and f <sup>∈</sup> σ<sup>J</sup> . There is a formula A such that A <sup>=</sup> <sup>σ</sup>, <sup>d</sup>(A) = <sup>J</sup> and <sup>A</sup> <sup>=</sup> <sup>f</sup> (actually, there are infinitely many such A's as soon as σ is not an atom and J <sup>=</sup> <sup>∅</sup>).

Proof. The proof is by induction on σ. If σ is an atom α then we take A <sup>=</sup> α[f]. Assume that <sup>σ</sup> = (<sup>ρ</sup> <sup>⇒</sup> <sup>τ</sup> ) so that <sup>f</sup>(j)=(m<sup>j</sup> , b<sup>j</sup> ) with <sup>m</sup><sup>j</sup> ∈ Mfin(ρ) and <sup>b</sup><sup>j</sup> <sup>∈</sup> <sup>τ</sup> . Since each <sup>m</sup><sup>j</sup> is finite and <sup>I</sup> is infinite, we can find a family (K<sup>j</sup> )<sup>j</sup>∈<sup>J</sup> of pairwise disjoint finite subsets of <sup>I</sup> such that #K<sup>j</sup> = #m<sup>j</sup> . Let <sup>K</sup> <sup>=</sup> <sup>j</sup>∈<sup>J</sup> <sup>K</sup><sup>j</sup> , there is a function g : K <sup>→</sup> <sup>ρ</sup> such that <sup>m</sup><sup>j</sup> = [ <sup>g</sup>(k) <sup>|</sup> <sup>k</sup> <sup>∈</sup> <sup>K</sup><sup>j</sup> ] for each <sup>j</sup> <sup>∈</sup> <sup>J</sup> (choose first an enumeration <sup>g</sup><sup>j</sup> : <sup>K</sup><sup>j</sup> <sup>→</sup> <sup>ρ</sup> of <sup>m</sup><sup>j</sup> for each <sup>j</sup> and then define <sup>g</sup>(k) = <sup>g</sup><sup>j</sup> (k) where <sup>j</sup> is the unique element of <sup>J</sup> such that <sup>k</sup> <sup>∈</sup> <sup>K</sup><sup>j</sup> ). Let <sup>u</sup> : <sup>K</sup> <sup>→</sup> <sup>J</sup> be the unique function such that <sup>k</sup> <sup>∈</sup> <sup>K</sup><sup>u</sup>(k) for all <sup>k</sup> <sup>∈</sup> <sup>K</sup>; since each <sup>K</sup><sup>j</sup> is finite, this function u is almost injective. By inductive hypothesis there is a formula A such that A <sup>=</sup> <sup>ρ</sup>, <sup>d</sup>(A) = <sup>K</sup> and <sup>A</sup> <sup>=</sup> <sup>g</sup>, and there is a formula <sup>B</sup> such that <sup>B</sup> <sup>=</sup> <sup>τ</sup> , <sup>d</sup>(B) = <sup>J</sup> and <sup>B</sup> = (b<sup>j</sup> )j∈<sup>J</sup> . Then the formula <sup>A</sup> <sup>⇒</sup><sup>u</sup> <sup>B</sup> is well formed (since u is an almost injective function <sup>d</sup>(A) = K <sup>→</sup> <sup>d</sup>(B) = J) and satisfies <sup>A</sup> <sup>⇒</sup><sup>u</sup> <sup>B</sup> <sup>=</sup> <sup>σ</sup>, <sup>d</sup>(<sup>A</sup> <sup>⇒</sup><sup>u</sup> <sup>B</sup>) = <sup>J</sup> and <sup>A</sup> <sup>⇒</sup><sup>u</sup> <sup>B</sup> <sup>=</sup> <sup>f</sup> as contended. ✷

As a consequence, for any type σ and any element a of σ (so a is a nonidempotent intersection type of kind σ), one can find a formula A such that <sup>A</sup> <sup>=</sup> <sup>σ</sup>, <sup>d</sup>(A) = {j} (where <sup>j</sup> is an arbitrary element of <sup>I</sup>) and <sup>A</sup> <sup>j</sup> <sup>=</sup> <sup>a</sup>. In other word, any intersection type can be represented as a formula (in infinitely many different ways in general of course, but up to renaming of indices, that is, up to "hereditary α-equivalence", this representation is unique).

For any formula A and J <sup>⊆</sup> I, we define a formula A-<sup>J</sup> such that <sup>A</sup>-<sup>J</sup> <sup>=</sup> <sup>A</sup>, <sup>d</sup>(A-<sup>J</sup> ) = <sup>d</sup>(A) <sup>∩</sup> <sup>J</sup> and <sup>A</sup>-<sup>J</sup> <sup>=</sup> <sup>A</sup> -<sup>J</sup> . The definition is by induction on <sup>A</sup>.

$$\begin{array}{l} \{ \begin{array}{l} \alpha[f] \sqcap J = \alpha[f \sqcap J] \\ \neg \ (A \Rightarrow\_{u} B) \urcorner\_{J} = (A \urcorner\_{K} \Rightarrow\_{v} B \urcorner\_{J}) \end{array} \} \text{ where } K = u^{-1}(\mathsf{d}(B) \cap J) \text{ and } v = u \urcorner\_{K} A \urcorner$$

Let u : <sup>d</sup>(A) <sup>→</sup> J be a bijection (so that u(d(A)) = J), we define a formula <sup>u</sup><sup>∗</sup>(A) such that <sup>u</sup><sup>∗</sup>(A) <sup>=</sup> <sup>A</sup>, <sup>d</sup>(u<sup>∗</sup>(A)) = <sup>u</sup>(d(A)) and <sup>u</sup><sup>∗</sup>(A) <sup>j</sup> <sup>=</sup> <sup>A</sup> <sup>u</sup>−1(j). The definition is by induction on A:

– <sup>u</sup><sup>∗</sup>(α[f]) = α[f ◦ u<sup>−</sup>1] – <sup>u</sup><sup>∗</sup>(<sup>A</sup> <sup>⇒</sup><sup>v</sup> <sup>B</sup>)=(<sup>A</sup> <sup>⇒</sup><sup>u</sup>◦<sup>v</sup> <sup>u</sup><sup>∗</sup>(B)).

Using these two auxiliary notions, we can give a set of three deduction rules for a minimal natural deduction allowing to prove formulas in this indexed intuitionistic logic. This logical system allows to derive sequents which are of shape

$$A\_1^{u\_1}, \dots, A\_n^{u\_n} \vdash B \tag{1}$$

where for each <sup>i</sup> = 1,...,n, the function <sup>u</sup><sup>i</sup> : <sup>d</sup>(A<sup>i</sup>) <sup>→</sup> <sup>d</sup>(B) is almost injective (it is not required that <sup>d</sup>(B) = <sup>n</sup> <sup>i</sup>=1 <sup>u</sup><sup>i</sup>(d(A<sup>i</sup>))). Notice that the expressions <sup>A</sup><sup>u</sup><sup>i</sup> <sup>i</sup> are not formulas; this construction <sup>A</sup><sup>u</sup> is part of the syntax of sequents, just as the "," separating these pseudo-formulas. Given a formula A and u : <sup>d</sup>(A) <sup>→</sup> J almost injective, it is nevertheless convenient to define A<sup>u</sup>∈Mfin(-<sup>A</sup>)<sup>J</sup> by <sup>A</sup><sup>u</sup> <sup>j</sup> <sup>=</sup> [<sup>A</sup> <sup>k</sup> <sup>|</sup> <sup>u</sup>(k) = <sup>j</sup> ]. In particular, when <sup>u</sup> is a bijection, <sup>A</sup><sup>u</sup> <sup>j</sup> = [<sup>A</sup> <sup>u</sup>−1(j) ].

The crucial point here is that such a sequent (1) involves no λ-term.

The main difference between the original system LL(I) of [3] and the present system is the way axioms are dealt with. In LL(I) there is no explicit identity axiom and only "atomic axioms" restricted to the basic constants of LL; indeed it is well-known that in LL all identity axioms can be η-expanded, leading to proofs using only such atomic axioms. In the λ-calculus, and especially in the untyped λ-calculus we want to deal with in next sections, such η-expansions are hard to handle so we prefer to use explicit identity axioms.

The axiom is

$$\frac{j \neq i \Rightarrow \mathbf{d}(A\_j) = \emptyset \text{ and } u\_i \text{ is a bijection}}{A\_1^{u\_1}, \dots, A\_n^{u\_n} \vdash u\_{i\*}(A\_i)}$$

so that for <sup>j</sup> <sup>=</sup> <sup>i</sup>, the function <sup>u</sup><sup>j</sup> is empty. A special case is

$$\frac{j \neq i \Rightarrow \mathbf{d}(A\_j) = \emptyset \text{ and } u\_i \text{ is the identity function}}{A\_1^{u\_1}, \dots, A\_n^{u\_n} \vdash A\_i}$$

which may look more familiar, but the general axiom rule, allowing to "delocalize" the proven formula <sup>A</sup><sup>i</sup> by an arbitrary bijection <sup>u</sup>i, is required as we shall see. The ⇒ introduction rule is quite simple

$$\begin{array}{c} A\_1^{u\_1}, \dots, A\_n^{u\_n}, A^u \vdash B \\\hline A\_1^{u\_1}, \dots, A\_n^{u\_n} \vdash A \Rightarrow\_u B \\\ldots, \end{array}$$

Last the ⇒ elimination rule is more complicated (from a Linear Logic point of view, this is due to the fact that it combines 3 LL logical rules: elimination, contraction and promotion). We have the deduction

$$\frac{C\_1^{u\_1}, \dots, C\_n^{u\_n} \vdash A \Rightarrow\_u B \qquad D\_1^{v\_1}, \dots, D\_n^{v\_n} \vdash A}{E\_1^{w\_1}, \dots, E\_n^{w\_n} \vdash B}$$

under the following conditions, to be satisfied by the involved formulas and functions: for each i = 1,...,n one has <sup>d</sup>(C<sup>i</sup>)∩d(D<sup>i</sup>) = <sup>∅</sup>, <sup>d</sup>(E<sup>i</sup>) = <sup>d</sup>(C<sup>i</sup>)+d(D<sup>i</sup>), <sup>C</sup><sup>i</sup> <sup>=</sup> <sup>E</sup><sup>i</sup><sup>d</sup>(Ci), <sup>D</sup><sup>i</sup> <sup>=</sup> <sup>E</sup><sup>i</sup><sup>d</sup>(Di), <sup>w</sup><sup>i</sup> <sup>d</sup>(Ci)<sup>=</sup> <sup>u</sup><sup>i</sup>, and <sup>w</sup><sup>i</sup> <sup>d</sup>(Di)<sup>=</sup> u ◦ v<sup>i</sup>.

Let π be a deduction tree of the sequent A<sup>u</sup><sup>1</sup> <sup>1</sup> ,...,A<sup>u</sup><sup>n</sup> <sup>n</sup> <sup>B</sup> in this system. By dropping all index information we obtain a derivation tree <sup>π</sup> of <sup>A</sup>1,...,A<sup>n</sup> B, and, upon choosing a sequence −→x of n pairwise distinct variables, we can associate with this derivation tree a simply typed <sup>λ</sup>-term <sup>π</sup>−→<sup>x</sup> which satisfies <sup>x</sup><sup>1</sup> : <sup>A</sup>1,...,x<sup>n</sup> : <sup>A</sup><sup>n</sup> <sup>π</sup>−→<sup>x</sup> : <sup>B</sup>.

## 3.3 Basic properties of LJ**(***I***)**

We prove some basic properties of this logical system. This is also the opportunity to get some acquaintance with it. Notice that in many places we drop the type annotations of variables in λ-terms, first because they are easy to recover, and second because the very same results and proofs are also valid in the untyped setting of Section 4.

Lemma 1 (Weakening). Assume that Φ A is provable by a proof π and let B be a formula such that <sup>d</sup>(B) = <sup>∅</sup>. Then Φ <sup>A</sup> is provable by a proof <sup>π</sup> , where <sup>Φ</sup> is obtained by inserting <sup>B</sup><sup>0</sup>d(A) at any place in <sup>Φ</sup>. Moreover <sup>π</sup>−→<sup>x</sup> <sup>=</sup> <sup>π</sup> −→x- (where −→

x is obtained from −→x by inserting a dummy variable at the same place).

The proof is an easy induction on the proof of Φ A.

Lemma 2 (Relocation). Let π be a proof of (A<sup>u</sup><sup>i</sup> <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>A</sup> let <sup>u</sup> : <sup>d</sup>(A) <sup>→</sup> <sup>J</sup> be a bijection, there is a proof π of (A<sup>u</sup>◦u<sup>i</sup> <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>u</sup><sup>∗</sup>(A) such that <sup>π</sup> −→<sup>x</sup> <sup>=</sup> <sup>π</sup>−→<sup>x</sup> .

The proof is a straightforward induction on π.

Lemma 3 (Restriction). Let π be a proof of (A<sup>u</sup><sup>i</sup> <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>A</sup> and let <sup>J</sup> <sup>⊆</sup> <sup>d</sup>(A). For <sup>i</sup> = 1,...,n, let <sup>K</sup><sup>i</sup> <sup>=</sup> <sup>u</sup><sup>i</sup> <sup>−</sup><sup>1</sup>(J) <sup>⊆</sup> <sup>d</sup>(A<sup>i</sup>) and u <sup>i</sup> <sup>=</sup> <sup>u</sup><sup>i</sup>-<sup>K</sup><sup>i</sup> : <sup>K</sup><sup>i</sup> <sup>→</sup> <sup>J</sup>. Then the sequent ((A<sup>i</sup>-<sup>K</sup><sup>i</sup> )<sup>u</sup>- <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>A</sup>-<sup>J</sup> has a proof <sup>π</sup> such that <sup>π</sup> −→<sup>x</sup> <sup>=</sup> <sup>π</sup>−→<sup>x</sup> .

Proof. By induction on π. Assume that π consists of an axiom (Au<sup>j</sup> <sup>j</sup> )<sup>n</sup> <sup>j</sup>=1 <sup>u</sup>i∗(Ai) with <sup>d</sup>(A<sup>j</sup> ) = <sup>∅</sup> if <sup>j</sup> <sup>=</sup> <sup>i</sup>, and <sup>u</sup><sup>i</sup> a bijection. With the notations of the lemma, <sup>K</sup><sup>j</sup> <sup>=</sup> <sup>∅</sup> for <sup>j</sup> <sup>=</sup> <sup>i</sup> and <sup>u</sup> <sup>i</sup> is a bijection <sup>K</sup><sup>i</sup> <sup>→</sup> <sup>J</sup>. Moreover <sup>u</sup> <sup>i</sup>∗(Ai-<sup>K</sup><sup>i</sup> ) = <sup>u</sup>i∗(Ai)-J so that ((Ai-<sup>K</sup><sup>i</sup> )u- <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>A</sup>-<sup>J</sup> is obtained by an axiom <sup>π</sup> with <sup>π</sup> −→<sup>x</sup> <sup>=</sup> <sup>x</sup><sup>i</sup> <sup>=</sup> <sup>π</sup>−→<sup>x</sup> .

Assume that π ends with a <sup>⇒</sup>-introduction rule:

$$\frac{(A\_i^{u\_i})\_{i=1}^{n+1} \vdash B}{(A\_i^{u\_i})\_{i=1}^n \vdash A\_{n+1} \Rightarrow\_{u\_{n+1}} B}$$

with <sup>A</sup> = (A<sup>n</sup>+1 <sup>⇒</sup><sup>u</sup>n+1 <sup>B</sup>), and we have <sup>π</sup>−→<sup>x</sup> <sup>=</sup> λx<sup>n</sup>+1 <sup>ρ</sup>−→x ,xn+1 . With the notations of the lemma we have A-<sup>J</sup> = (A<sup>n</sup>+1-<sup>K</sup>n+1 ⇒<sup>u</sup>- <sup>n</sup>+1 <sup>B</sup>-<sup>J</sup> ). By inductive hypothesis there is a proof ρ of (A<sup>i</sup>- u- i K<sup>i</sup> ) n+1 <sup>i</sup>=1 <sup>B</sup>-<sup>J</sup> such that <sup>ρ</sup> −→x ,xn+1 <sup>=</sup> <sup>ρ</sup>−→x ,xn+1 and hence we have a proof π of (A<sup>i</sup>- u- i K<sup>i</sup> )n <sup>i</sup>=1 <sup>A</sup>-<sup>J</sup> with <sup>π</sup> −→<sup>x</sup> <sup>=</sup> λx<sup>n</sup>+1 <sup>ρ</sup> −→x ,xn+1 = <sup>π</sup>−→<sup>x</sup> as contended.

Assume last that π ends with a <sup>⇒</sup>-elimination rule:

$$\begin{array}{c c c} \mu & \rho \\ (B\_i^{v\_i})\_{i=1}^n \vdash B \Rightarrow\_v A & (C\_i^{w\_i})\_{i=1}^n \vdash B \\ \hline \end{array}$$

with <sup>d</sup>(A<sup>i</sup>) = <sup>d</sup>(B<sup>i</sup>) + <sup>d</sup>(C<sup>i</sup>), <sup>B</sup><sup>i</sup> <sup>=</sup> <sup>A</sup><sup>i</sup><sup>d</sup>(Bi) and <sup>C</sup><sup>i</sup> <sup>=</sup> <sup>A</sup><sup>i</sup><sup>d</sup>(Ci), u<sup>i</sup><sup>d</sup>(Bi) <sup>=</sup> <sup>v</sup><sup>i</sup> and u<sup>i</sup><sup>d</sup>(Ci) <sup>=</sup> <sup>v</sup> ◦ <sup>w</sup><sup>i</sup> for <sup>i</sup> = 1,...,n, and of course <sup>π</sup>−→<sup>x</sup> <sup>=</sup> μ−→<sup>x</sup> <sup>ρ</sup>−→<sup>x</sup> . Let <sup>L</sup> <sup>=</sup> <sup>v</sup><sup>−</sup>1(J) <sup>⊆</sup> <sup>d</sup>(B). Let <sup>L</sup><sup>i</sup> <sup>=</sup> <sup>v</sup><sup>i</sup> <sup>−</sup>1(J) and <sup>R</sup><sup>i</sup> <sup>=</sup> <sup>w</sup><sup>i</sup> <sup>−</sup>1(L) for i = 1,...,n (we also set v <sup>i</sup> <sup>=</sup> <sup>v</sup><sup>i</sup>-<sup>L</sup><sup>i</sup> , <sup>w</sup> <sup>i</sup> <sup>=</sup> <sup>w</sup><sup>i</sup>-<sup>R</sup><sup>i</sup> and <sup>v</sup> <sup>=</sup> <sup>v</sup>-<sup>L</sup>). By inductive hypothesis, we have a proof <sup>μ</sup> of (B<sup>i</sup>- v- i L<sup>i</sup> )n <sup>i</sup>=1 <sup>B</sup>-<sup>L</sup> ⇒<sup>v</sup>- A-<sup>J</sup> such that <sup>μ</sup> −→<sup>x</sup> <sup>=</sup> <sup>μ</sup>−→<sup>x</sup> and a proof <sup>ρ</sup> of (C<sup>i</sup>- w- i R<sup>i</sup> )n <sup>i</sup>=1 <sup>B</sup>-<sup>L</sup> such that <sup>ρ</sup> −→<sup>x</sup> <sup>=</sup> <sup>ρ</sup>−→<sup>x</sup> . Now, setting <sup>K</sup><sup>i</sup> <sup>=</sup> <sup>u</sup><sup>i</sup> <sup>−</sup>1(K), observe that


It follows that <sup>d</sup>(A<sup>i</sup>-<sup>K</sup><sup>i</sup> ) = <sup>L</sup><sup>i</sup> <sup>+</sup> <sup>R</sup><sup>i</sup>, and, setting <sup>u</sup> <sup>i</sup> <sup>=</sup> <sup>u</sup><sup>i</sup>-<sup>K</sup><sup>i</sup> , we have <sup>u</sup> i-<sup>L</sup><sup>i</sup> <sup>=</sup> <sup>v</sup> i and u i-<sup>R</sup><sup>i</sup> <sup>=</sup> <sup>v</sup> ◦ <sup>w</sup> <sup>i</sup>. Hence we have a proof π of (A<sup>i</sup>- u- i K<sup>i</sup> )n <sup>i</sup>=1 <sup>A</sup>-<sup>J</sup> such that π −→<sup>x</sup> = μ −→x ρ −→<sup>x</sup> = μ−→<sup>x</sup> <sup>ρ</sup>−→<sup>x</sup> <sup>=</sup> <sup>π</sup>−→<sup>x</sup> as contended. ✷

Though substitution lemmas are usually trivial, the LJ(I) substitution lemma requires some care in its statement and proof<sup>6</sup>.

Lemma 4 (Substitution). Assume that (A<sup>u</sup><sup>j</sup> <sup>j</sup> )<sup>n</sup> <sup>j</sup>=1 <sup>A</sup> with a proof <sup>μ</sup> and that, for some i ∈ {1,...,n}, (B<sup>v</sup><sup>j</sup> <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>A</sup><sup>i</sup> with a proof <sup>ρ</sup>. Then there is a proof π of (C<sup>w</sup><sup>j</sup> <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>A</sup> such that <sup>π</sup>(−→<sup>x</sup> )\<sup>i</sup> <sup>=</sup> <sup>μ</sup>−→<sup>x</sup> ρ(−→<sup>x</sup> )\<sup>i</sup> /xi as soon as for each <sup>j</sup> = 1,...,n <sup>−</sup> <sup>1</sup>, <sup>d</sup>(C<sup>j</sup> ) = <sup>d</sup>(A<sup>s</sup>(j,i)) + <sup>d</sup>(B<sup>j</sup> ) for each <sup>j</sup> = 1,...,n <sup>−</sup> <sup>1</sup> (remember that this requires also that <sup>d</sup>(A<sup>s</sup>(j,i)) <sup>∩</sup> <sup>d</sup>(B<sup>j</sup> ) = <sup>∅</sup>) with:

<sup>6</sup> We use notations introduced in Section 1, especially for s(j, i).

$$\begin{array}{l} -\, ^C\_j\ulcorner\_{\mathsf{d}(A\_{\mathfrak{u}(j,i)})} = A\_{\mathfrak{s}(j,i)} \text{ and } w\_j\ulcorner\_{\mathsf{d}(A\_{\mathfrak{u}(j,i)})} = u\_{\mathfrak{s}(j,i)},\\ -\, ^C\_j\ulcorner\_{\mathsf{d}(B\_j)} = B\_j \text{ and } w\_j\ulcorner\_{\mathsf{d}(B\_j)} = u\_i \circ v\_j. \end{array}$$

Proof. By induction on the proof μ. Assume that μ is an axiom, so that there is <sup>a</sup> <sup>k</sup> ∈ {1,...,n} such that <sup>A</sup> <sup>=</sup> <sup>u</sup>k∗(Ak), <sup>u</sup><sup>k</sup> is a bijection and <sup>d</sup>(A<sup>j</sup> ) = <sup>∅</sup> for all <sup>j</sup> <sup>=</sup> <sup>k</sup>. In that case we have <sup>μ</sup>−→<sup>x</sup> <sup>=</sup> <sup>x</sup>k. There are two subcases to consider. Assume first that <sup>k</sup> <sup>=</sup> <sup>i</sup>. By Lemma <sup>2</sup> there is a proof <sup>ρ</sup> of (Bui◦v<sup>j</sup> <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>u</sup>i∗(Ai) such that ρ (−→<sup>x</sup> )\<sup>i</sup> <sup>=</sup> <sup>ρ</sup>(−→<sup>x</sup> )\<sup>i</sup> . We have <sup>C</sup><sup>j</sup> <sup>=</sup> <sup>B</sup><sup>j</sup> and <sup>w</sup><sup>j</sup> <sup>=</sup> <sup>u</sup><sup>i</sup> ◦ <sup>v</sup><sup>j</sup> for <sup>j</sup> = 1,...,n <sup>−</sup> <sup>1</sup>, so that <sup>ρ</sup> is a proof of (C<sup>w</sup><sup>j</sup> <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>A</sup>, so we take <sup>π</sup> <sup>=</sup> <sup>ρ</sup> and equation <sup>π</sup>(−→<sup>x</sup> )\<sup>i</sup> <sup>=</sup> μ−→<sup>x</sup> ρ(−→<sup>x</sup> )\<sup>i</sup> /xi holds since <sup>μ</sup>−→<sup>x</sup> <sup>=</sup> <sup>x</sup><sup>i</sup>. Assume next that <sup>k</sup> <sup>=</sup> <sup>i</sup>, then <sup>d</sup>(A<sup>i</sup>) = <sup>∅</sup> and hence <sup>d</sup>(B<sup>j</sup> ) = <sup>∅</sup> (and <sup>v</sup><sup>j</sup> = 0∅) for <sup>j</sup> = 1,...,n <sup>−</sup> <sup>1</sup>. Therefore <sup>C</sup><sup>j</sup> <sup>=</sup> <sup>A</sup>s(j,i) and <sup>w</sup><sup>j</sup> <sup>=</sup> <sup>v</sup>s(j,i) for <sup>j</sup> = 1,...,n <sup>−</sup> <sup>1</sup>. So our target sequent (C<sup>w</sup><sup>j</sup> <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>A</sup> can also be written (A<sup>u</sup>s(j,i) <sup>s</sup>(j,i) ) n−1 <sup>j</sup>=1 <sup>u</sup><sup>k</sup>∗(A<sup>k</sup>) and is provable by a proof <sup>π</sup> such that <sup>π</sup>(−→<sup>x</sup> )\<sup>i</sup> <sup>=</sup> <sup>x</sup><sup>k</sup> as contended.

Assume now that <sup>μ</sup> is a <sup>⇒</sup>-intro, that is <sup>A</sup> = (A<sup>n</sup>+1 <sup>⇒</sup><sup>u</sup>n+1 <sup>A</sup> ) and μ is

$$\frac{\theta}{(A\_j^{u\_j})\_{j=1}^{n+1} \vdash A'} $$

We set <sup>B</sup><sup>n</sup> <sup>=</sup> <sup>A</sup><sup>n</sup>+1-<sup>∅</sup> and of course <sup>v</sup><sup>n</sup>+1 = 0d(A). Then we have a proof <sup>ρ</sup> of (Bvj <sup>j</sup> )<sup>n</sup> <sup>j</sup>=1 <sup>A</sup><sup>i</sup> such that <sup>ρ</sup> (−→<sup>x</sup> )\i,xn+1 <sup>=</sup> <sup>ρ</sup>(−→<sup>x</sup> )\<sup>i</sup> by Lemma 1. We set <sup>C</sup><sup>n</sup> <sup>=</sup> <sup>A</sup><sup>n</sup>+1 and <sup>w</sup><sup>n</sup> <sup>=</sup> <sup>u</sup><sup>n</sup>+1. Then by inductive hypothesis applied to <sup>θ</sup> we have a proof <sup>π</sup><sup>0</sup> of (C<sup>w</sup><sup>j</sup> <sup>j</sup> )<sup>n</sup> <sup>j</sup>=1 <sup>A</sup> which satisfies <sup>π</sup><sup>0</sup> (−→<sup>x</sup> )\i,xn+1 <sup>=</sup> <sup>θ</sup>−→x ,xn+1 ρ(−→<sup>x</sup> )\<sup>i</sup> /xi and applying a <sup>⇒</sup>-introduction rule we get a proof π of (C<sup>w</sup><sup>j</sup> <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>A</sup> such that <sup>π</sup>(−→<sup>x</sup> )\<sup>i</sup> <sup>=</sup> λx<sup>n</sup>+1 (θ−→x ,xn+1 ρ(−→<sup>x</sup> )\<sup>i</sup> /xi ) = <sup>μ</sup>−→<sup>x</sup> ρ(−→<sup>x</sup> )\<sup>i</sup> /xi as expected.

Assume last that the proof μ ends with

$$\begin{array}{c} \varphi\\ (E\_j^{s\_j})\_{j=1}^n \vdash E \Rightarrow\_s A \end{array} \begin{array}{c} \psi\\ (F\_j^{t\_j})\_{j=1}^n \vdash E \end{array}$$

with <sup>d</sup>(A<sup>j</sup> ) = <sup>d</sup>(E<sup>j</sup> ) + <sup>d</sup>(F<sup>j</sup> ), <sup>A</sup><sup>j</sup> <sup>d</sup>(E<sup>j</sup> ) <sup>=</sup> <sup>E</sup><sup>j</sup> , <sup>A</sup><sup>j</sup> <sup>d</sup>(F<sup>j</sup> ) <sup>=</sup> <sup>F</sup><sup>j</sup> , <sup>u</sup><sup>j</sup> <sup>d</sup>(E<sup>j</sup> ) <sup>=</sup> <sup>s</sup><sup>j</sup> and <sup>u</sup><sup>j</sup> <sup>d</sup>(F<sup>j</sup> ) <sup>=</sup> <sup>s</sup> ◦ <sup>t</sup><sup>j</sup> , for <sup>j</sup> = 1,...,n. And we have <sup>μ</sup>−→<sup>x</sup> <sup>=</sup> ϕ−→<sup>x</sup> <sup>ψ</sup>−→<sup>x</sup> . The idea is to "share" the substituting proof ρ of (B<sup>v</sup><sup>j</sup> <sup>j</sup> )<sup>n</sup> <sup>j</sup>=1 <sup>A</sup><sup>i</sup> among <sup>ϕ</sup> and <sup>ψ</sup> according to what they need, as specified by the formulas <sup>E</sup><sup>i</sup> and <sup>F</sup><sup>i</sup>. So we write <sup>d</sup>(B<sup>j</sup> ) = <sup>L</sup><sup>j</sup> <sup>+</sup>R<sup>j</sup> where <sup>L</sup><sup>j</sup> <sup>=</sup> <sup>v</sup><sup>j</sup> <sup>−</sup><sup>1</sup>(d(E<sup>i</sup>)) and <sup>R</sup><sup>j</sup> <sup>=</sup> <sup>v</sup><sup>j</sup> <sup>−</sup><sup>1</sup>(d(F<sup>i</sup>)) and by Lemma <sup>3</sup> we have two proofs <sup>ρ</sup><sup>L</sup> of (B<sup>j</sup> - v<sup>L</sup> j <sup>L</sup><sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>E</sup><sup>i</sup> and (B<sup>j</sup> - v<sup>R</sup> j <sup>R</sup><sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>F</sup><sup>i</sup> where we set vL <sup>j</sup> <sup>=</sup> <sup>v</sup><sup>j</sup> -<sup>L</sup><sup>j</sup> and <sup>v</sup><sup>R</sup> <sup>j</sup> <sup>=</sup> <sup>v</sup><sup>j</sup> -<sup>R</sup><sup>j</sup> , obtained from <sup>ρ</sup> by restriction. These proofs satisfy ρL (−→<sup>x</sup> )\<sup>i</sup> <sup>=</sup> <sup>ρ</sup><sup>R</sup> (−→<sup>x</sup> )\<sup>i</sup> <sup>=</sup> <sup>ρ</sup>(−→<sup>x</sup> )\<sup>i</sup> .

Now we want to apply the inductive hypothesis to ϕ and ρL, in order to get a proof of the sequent (Gw<sup>L</sup> j <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>E</sup> <sup>⇒</sup><sup>s</sup> <sup>A</sup> where <sup>G</sup><sup>j</sup> <sup>=</sup> <sup>C</sup><sup>j</sup> <sup>d</sup>(Es(j,i))+L<sup>j</sup> (observe indeed that <sup>d</sup>(Es(j,i)) <sup>⊆</sup> <sup>d</sup>(As(j,i)) and <sup>L</sup><sup>j</sup> <sup>⊆</sup> <sup>d</sup>(B<sup>j</sup> ) and hence are disjoint by our assumption that <sup>d</sup>(C<sup>j</sup> ) = <sup>d</sup>(As(j,i)) + <sup>d</sup>(B<sup>j</sup> )) and <sup>w</sup><sup>L</sup> <sup>j</sup> <sup>=</sup> <sup>w</sup><sup>j</sup> <sup>d</sup>(Es(j,i))+L<sup>j</sup> . With these definitions, and by our assumptions about <sup>C</sup><sup>j</sup> and <sup>w</sup><sup>j</sup> , we have for all j = 1,...,n <sup>−</sup> <sup>1</sup>

$$\begin{split} G\_{j}\big[\_{\mathsf{d}(E\_{\mathsf{u}(j,i)})} &= C\_{j}\big]\big[\_{\mathsf{d}(A\_{\mathsf{u}(j,i)})}\big[\_{\mathsf{d}(E\_{\mathsf{u}(j,i)})} = A\_{\mathsf{u}(j,i)}\big]\big[\_{\mathsf{d}(E\_{\mathsf{u}(j,i)})} = E\_{\mathsf{u}(j,i)}\\ w\_{j}^{L}\big[\_{\mathsf{d}(E\_{\mathsf{u}(j,i)})} &= w\_{j}\big]\big[\_{\mathsf{d}(A\_{\mathsf{u}(j,i)})}\big[\_{\mathsf{d}(E\_{\mathsf{u}(j,i)})} = u\_{\mathsf{u}(j,i)}\big]\big[\_{\mathsf{d}(E\_{\mathsf{u}(j,i)})} = s\_{\mathsf{u}(j,i)}\\ G\_{j}\big|\_{L\_{j}} &= C\_{j}\big|\_{\mathsf{d}(B\_{j})}\big|\_{L\_{j}} = B\_{j}\big|\_{L\_{j}}\\ w\_{j}^{L}\big|\_{L\_{j}} &= w\_{j}\big|\_{\mathsf{d}(B\_{j})}\big|\_{L\_{j}} = \left(u\_{i}\odot v\_{j}\right)\big|\_{L\_{j}} = u\_{i}\big|\_{\mathsf{d}(E\_{i})}\bigcirc v\_{j}^{L} = s\_{i}\bigcirc v\_{j}^{L}.\end{split}$$

Therefore the inductive hypothesis applies yielding a proof <sup>ϕ</sup> of (G<sup>w</sup><sup>L</sup> j <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>E</sup> <sup>⇒</sup><sup>s</sup> <sup>A</sup> such that <sup>ϕ</sup> (−→<sup>x</sup> )\<sup>i</sup> <sup>=</sup> <sup>ϕ</sup>−→<sup>x</sup> ρL (−→<sup>x</sup> )\<sup>i</sup> /xi <sup>=</sup> <sup>ϕ</sup>−→<sup>x</sup> ρ(−→<sup>x</sup> )\<sup>i</sup> /xi .

Next we want to apply the inductive hypothesis to <sup>ψ</sup> and <sup>ρ</sup><sup>R</sup>, in order to get a proof of the sequent (H<sup>r</sup><sup>j</sup> <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>E</sup> where, for <sup>j</sup> = 1,...,n <sup>−</sup> <sup>1</sup>, <sup>H</sup><sup>j</sup> <sup>=</sup> Cj <sup>d</sup>(Fs(j,i))+R<sup>j</sup> (again <sup>d</sup>(Fs(j,i)) <sup>⊆</sup> <sup>d</sup>(As(j,i)) and <sup>R</sup><sup>j</sup> <sup>⊆</sup> <sup>d</sup>(B<sup>j</sup> ) are disjoint by our assumption that <sup>d</sup>(C<sup>j</sup> ) = <sup>d</sup>(As(j,i)) + <sup>d</sup>(B<sup>j</sup> )) and <sup>r</sup><sup>j</sup> is defined by <sup>r</sup><sup>j</sup> <sup>d</sup>(Fs(j,i)) = <sup>t</sup>s(j,i) and <sup>r</sup><sup>j</sup> -<sup>R</sup><sup>j</sup> <sup>=</sup> <sup>t</sup><sup>i</sup> ◦ <sup>v</sup><sup>R</sup> <sup>j</sup> . Remember indeed that <sup>v</sup><sup>R</sup> <sup>j</sup> : <sup>R</sup><sup>j</sup> <sup>→</sup> <sup>d</sup>(F<sup>i</sup>) and <sup>t</sup><sup>i</sup> : <sup>d</sup>(F<sup>i</sup>) <sup>→</sup> <sup>d</sup>(E). We have

$$\begin{aligned} \operatorname{id}\_{j}\restriction\_{\mathsf{d}(F\_{\mathtt{s}(j,i)})} &= C\_{j}\restriction\_{\mathsf{d}(A\_{\mathtt{s}(j,i)})}\restriction\_{\mathsf{d}(F\_{\mathtt{s}(j,i)})} = A\_{\mathtt{s}(j,i)}\restriction\_{\mathsf{d}(F\_{\mathtt{s}(j,i)})} = F\_{\mathtt{s}(j,i)}\\ H\_{j}\restriction\_{R\_{j}} &= C\_{j}\restriction\_{\mathsf{d}(B\_{j})}\restriction\_{R\_{j}} = B\_{j}\restriction\_{R\_{j}} \end{aligned}$$

and hence by inductive hypothesis there is a proof ψ of (H<sup>r</sup><sup>j</sup> <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>E</sup> such that ψ (−→<sup>x</sup> )\<sup>i</sup> <sup>=</sup> <sup>ψ</sup>−→<sup>x</sup> ρR (−→<sup>x</sup> )\<sup>i</sup> /xi <sup>=</sup> <sup>ψ</sup>−→<sup>x</sup> ρ(−→<sup>x</sup> )\<sup>i</sup> /xi .

To end the proof of the lemma, it will be sufficient to prove that we can apply <sup>a</sup> <sup>⇒</sup>-elimination rule to the sequents (G<sup>w</sup><sup>L</sup> j <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>E</sup> <sup>⇒</sup><sup>s</sup> <sup>A</sup> and (H<sup>r</sup><sup>j</sup> <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>E</sup> in order to get a proof π of the sequent (C<sup>w</sup><sup>j</sup> <sup>j</sup> ) n−1 <sup>j</sup>=1 <sup>A</sup>. Indeed, the proof <sup>π</sup> obtained in that way will satisfy <sup>π</sup>(−→<sup>x</sup> )\<sup>i</sup> <sup>=</sup> ϕ (−→<sup>x</sup> )\<sup>i</sup> ψ (−→<sup>x</sup> )\<sup>i</sup> <sup>=</sup> <sup>μ</sup>−→<sup>x</sup> ρ(−→<sup>x</sup> )\<sup>i</sup> /xi . Let <sup>j</sup> ∈ {1,...,n−1}. We have <sup>C</sup><sup>j</sup> <sup>d</sup>(G<sup>j</sup> ) <sup>=</sup> <sup>G</sup><sup>j</sup> and <sup>C</sup><sup>j</sup> <sup>d</sup>(H<sup>j</sup> ) <sup>=</sup> <sup>H</sup><sup>j</sup> simply because <sup>G</sup><sup>j</sup> and <sup>H</sup><sup>j</sup> are defined by restricting <sup>C</sup><sup>j</sup> . Moreover <sup>d</sup>(G<sup>j</sup> ) = <sup>d</sup>(E<sup>s</sup>(j,i)) + <sup>L</sup><sup>j</sup> and <sup>d</sup>(H<sup>j</sup> ) = <sup>d</sup>(F<sup>s</sup>(j,i)) + <sup>R</sup><sup>j</sup> . Therefore <sup>d</sup>(G<sup>j</sup> ) <sup>∩</sup> <sup>d</sup>(H<sup>j</sup> ) = <sup>∅</sup> and

$$\mathsf{d}(C\_j) = \mathsf{d}(A\_{\mathfrak{s}(j,i)}) + \mathsf{d}(B\_j) = \mathsf{d}(E\_{\mathfrak{s}(j,i)}) + \mathsf{d}(F\_{\mathfrak{s}(j,i)}) + L\_j + R\_j = \mathsf{d}(G\_j) + \mathsf{d}(H\_j) \dots$$

We have <sup>w</sup><sup>j</sup> <sup>d</sup>(G<sup>j</sup> ) <sup>=</sup> <sup>w</sup><sup>L</sup> <sup>j</sup> by definition of <sup>w</sup><sup>L</sup> <sup>j</sup> as <sup>w</sup><sup>j</sup> <sup>d</sup>(Es(j,i))+L<sup>j</sup> . We have

$$\begin{aligned} w\_j \restriction\_{\mathsf{d}(H\_j)} \restriction\_{\mathsf{d}(F\_{\mathsf{s}(j,i)})} &= w\_j \restriction\_{\mathsf{d}(A\_{\mathsf{s}(j,i)})} \restriction\_{\mathsf{d}(F\_{\mathsf{s}(j,i)})} = u\_{\mathsf{s}(j,i)} \restriction\_{\mathsf{d}(F\_{\mathsf{s}(j,i)})} \\ &= s \circ t\_{\mathsf{s}(j,i)} = (s \circ r\_j) \restriction\_{\mathsf{d}(F\_{\mathsf{s}(j,i)})} \\ w\_j \restriction\_{\mathsf{d}(H\_j)} \restriction\_{R\_j} &= w\_j \restriction\_{\mathsf{d}(B\_j)} \restriction\_{R\_j} = (u\_i \circ v\_j) \restriction\_{R\_j} \\ &= u\_i \restriction\_{\mathsf{d}(F\_i)} \circ v\_j^R = s \circ t\_i \circ v\_j^R = s \circ r\_j \restriction\_{R\_j} = (s \circ r\_j) \restriction\_{R\_j} \end{aligned}$$

and therefore <sup>w</sup><sup>j</sup> <sup>d</sup>(H<sup>j</sup> ) <sup>=</sup> <sup>s</sup> ◦ <sup>r</sup><sup>j</sup> as required. ✷

We shall often use the two following consequences of the Substitution Lemma.

Lemma 5. Given a proof μ of (Au<sup>j</sup> <sup>j</sup> )<sup>n</sup> <sup>j</sup>=1 <sup>A</sup> and a proof <sup>ρ</sup> of <sup>B</sup><sup>v</sup> <sup>A</sup><sup>i</sup> (for some i ∈ {1,...,n}), there is a proof π of (Au<sup>j</sup> <sup>j</sup> ) i−1 <sup>j</sup>=1, Bui◦v,(Au<sup>j</sup> <sup>j</sup> )<sup>n</sup> <sup>j</sup>=i+1 <sup>A</sup> such that <sup>π</sup>−→<sup>x</sup> <sup>=</sup> <sup>μ</sup>−→<sup>x</sup> ρ<sup>x</sup><sup>i</sup> /xi 

Proof. By weakening we have a proof <sup>μ</sup> of (A<sup>u</sup><sup>j</sup> <sup>j</sup> )<sup>i</sup> <sup>j</sup>=1, B- 0d(A) <sup>∅</sup> ,(A<sup>u</sup><sup>j</sup> <sup>j</sup> )<sup>n</sup> <sup>j</sup>=i+1 <sup>A</sup> such that μ −→<sup>x</sup> <sup>=</sup> <sup>μ</sup>(−→<sup>x</sup> )\i+1 (where −→<sup>x</sup> is a list of pairwise distinct variables of length <sup>n</sup>+1), as well as a proof <sup>ρ</sup> of (A<sup>j</sup> - 0d(Ai) <sup>∅</sup> )<sup>i</sup> <sup>j</sup>=1, B<sup>v</sup>,(A<sup>j</sup> - 0d(Ai) <sup>∅</sup> )<sup>n</sup> <sup>j</sup>=i+1 <sup>A</sup><sup>i</sup> such that ρ −→<sup>x</sup> <sup>=</sup> <sup>ρ</sup><sup>x</sup>i+1 . By Lemma 4, we have a proof π of (A<sup>u</sup><sup>j</sup> <sup>j</sup> ) i−1 <sup>j</sup>=1, B<sup>u</sup>i◦<sup>v</sup>,(A<sup>u</sup><sup>j</sup> <sup>j</sup> )<sup>n</sup> <sup>j</sup>=i+1 A which satisfies π (−→<sup>x</sup> )\<sup>i</sup> <sup>=</sup> <sup>μ</sup> −→x ρ (−→<sup>x</sup> )\<sup>i</sup> /xi <sup>=</sup> <sup>μ</sup>−→<sup>x</sup> ρ<sup>x</sup><sup>i</sup> /xi . ✷

Lemma 6. Given a proof μ of A<sup>v</sup> <sup>B</sup> and a proof <sup>ρ</sup> of (A<sup>u</sup><sup>j</sup> <sup>j</sup> )<sup>n</sup> <sup>j</sup>=1 <sup>A</sup>, there is a proof π of (A<sup>v</sup>◦u<sup>j</sup> <sup>j</sup> )<sup>n</sup> <sup>j</sup>=1 <sup>B</sup> such that <sup>π</sup>−→<sup>x</sup> <sup>=</sup> <sup>μ</sup><sup>x</sup> <sup>ρ</sup>−→<sup>x</sup> /x .

The proof is similar to the previous one.

If <sup>A</sup> and <sup>B</sup> are formulas such that <sup>A</sup> <sup>=</sup> B, <sup>d</sup>(A) = <sup>d</sup>(B) and A <sup>=</sup> B , we say that A and B are similar and we write A <sup>∼</sup> B. One fundamental property of our deduction system is that two formulas which represent the same family of intersection types are logically equivalent.

Theorem 1. If <sup>A</sup> <sup>∼</sup> <sup>B</sup> then <sup>A</sup>Id <sup>B</sup> with a proof <sup>π</sup> such that <sup>π</sup><sup>x</sup> <sup>∼</sup><sup>η</sup> <sup>x</sup>.

Proof. Assume that <sup>A</sup> <sup>=</sup> <sup>α</sup>[f], then we have <sup>B</sup> <sup>=</sup> <sup>A</sup> and <sup>A</sup>Id B is an axiom.

Assume that <sup>A</sup> = (<sup>C</sup> <sup>⇒</sup><sup>u</sup> <sup>D</sup>) and <sup>B</sup> = (<sup>E</sup> <sup>⇒</sup><sup>v</sup> <sup>F</sup>). We have <sup>D</sup> <sup>∼</sup> <sup>F</sup> and hence <sup>D</sup>Id <sup>F</sup> with a proof <sup>ρ</sup> such that <sup>ρ</sup><sup>x</sup> <sup>∼</sup><sup>η</sup> <sup>x</sup>. And there is a bijection w : <sup>d</sup>(E) <sup>→</sup> <sup>d</sup>(C) such that w<sup>∗</sup>(E) <sup>∼</sup> <sup>C</sup> and <sup>u</sup> ◦ <sup>w</sup> <sup>=</sup> <sup>v</sup>. By inductive hypothesis we have a proof <sup>μ</sup> of <sup>w</sup><sup>∗</sup>(E)Id <sup>C</sup> such that <sup>μ</sup><sup>y</sup> <sup>∼</sup><sup>η</sup> <sup>y</sup>, and hence using the axiom <sup>E</sup><sup>w</sup> w<sup>∗</sup>(E) and Lemma <sup>5</sup> we have a proof μ of E<sup>w</sup> C such that μ <sup>x</sup> <sup>=</sup> <sup>μ</sup><sup>x</sup>.

There is a proof <sup>π</sup><sup>1</sup> of (<sup>C</sup> <sup>⇒</sup><sup>u</sup> <sup>D</sup>)Id, C<sup>u</sup> <sup>D</sup> such that <sup>π</sup><sup>1</sup> x,y = (x) <sup>y</sup> (consider the two axioms (<sup>C</sup> <sup>⇒</sup><sup>u</sup> <sup>D</sup>)Id, C- 0d(D) <sup>∅</sup> <sup>C</sup> <sup>⇒</sup><sup>u</sup> <sup>D</sup> and (<sup>C</sup> <sup>⇒</sup><sup>u</sup> <sup>D</sup>)- 0d(C) <sup>∅</sup> , CId <sup>C</sup> and use a <sup>⇒</sup>-elimination rule). So by Lemma <sup>5</sup> there is a proof <sup>π</sup><sup>2</sup> of (<sup>C</sup> <sup>⇒</sup><sup>u</sup> <sup>D</sup>)Id, E<sup>u</sup>◦<sup>w</sup> <sup>D</sup>, that is of (<sup>C</sup> <sup>⇒</sup><sup>u</sup> <sup>D</sup>)Id, E<sup>v</sup> <sup>D</sup>, such that <sup>π</sup><sup>2</sup> x,y = (x) <sup>μ</sup><sup>y</sup>. Applying Lemma <sup>6</sup> we get a proof <sup>π</sup><sup>3</sup> of (<sup>C</sup> <sup>⇒</sup><sup>u</sup> <sup>D</sup>)Id, E<sup>v</sup> <sup>F</sup> such that <sup>π</sup><sup>3</sup> x,y = ρz (x) μ<sup>y</sup>/z . We get the expected proof π by a <sup>⇒</sup>-introduction rule so that <sup>π</sup><sup>x</sup> <sup>=</sup> λy ρ<sup>z</sup> (x) μ<sup>y</sup>/z . By inductive hypothesis <sup>π</sup><sup>x</sup> <sup>∼</sup><sup>η</sup> <sup>x</sup>. ✷

#### 3.4 Relation between intersection types and LJ**(***I***)**

Now we explain the precise connection between non-idempotent intersection types and our logical system LJ(I). This connection consists of two statements:


Theorem 2 (Soundness). Let π be a deduction tree of the sequent (A<sup>u</sup><sup>i</sup> <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>B</sup> and −→<sup>x</sup> a sequence of <sup>n</sup> pairwise distinct variables. Then the <sup>λ</sup>-term <sup>π</sup>−→<sup>x</sup> satisfies (x<sup>i</sup> : <sup>A</sup><sup>u</sup><sup>i</sup> <sup>i</sup> <sup>j</sup> : <sup>A</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1 <sup>π</sup>−→<sup>x</sup> : <sup>B</sup> <sup>j</sup> : <sup>B</sup> in the intersection type system, for each j <sup>∈</sup> <sup>d</sup>(B).

Proof. We prove the first part by induction on π (in the course of this induction, we recall the precise definition of <sup>π</sup>−→<sup>x</sup> ). If <sup>π</sup> is the proof

$$\frac{q \neq i \Rightarrow \mathbf{d}(A\_q) = \emptyset \text{ and } u\_i \text{ is a bijection}}{(A\_q^{u\_q})\_{q=1}^n \vdash u\_{i\*}(A\_i)}$$

(so that <sup>B</sup> <sup>=</sup> <sup>u</sup><sup>i</sup>∗(A<sup>i</sup>)) then <sup>π</sup>−→<sup>x</sup> <sup>=</sup> <sup>x</sup><sup>i</sup>. We have <sup>A</sup><sup>u</sup><sup>q</sup> <sup>q</sup> <sup>j</sup> =[] if <sup>q</sup> <sup>=</sup> <sup>i</sup>, <sup>A</sup><sup>u</sup><sup>i</sup> <sup>i</sup> <sup>j</sup> = [<sup>A</sup><sup>i</sup> <sup>u</sup>i−1(j) ] and <sup>u</sup><sup>i</sup>∗(A<sup>i</sup>) <sup>j</sup> <sup>=</sup> <sup>A</sup><sup>i</sup> <sup>u</sup>i−1(j). It follows that (x<sup>q</sup> : <sup>A</sup><sup>u</sup><sup>q</sup> <sup>q</sup> <sup>j</sup> : <sup>A</sup><sup>q</sup>)<sup>n</sup> <sup>q</sup>=1 <sup>x</sup><sup>i</sup> : <sup>B</sup> <sup>j</sup> : <sup>B</sup> is a valid axiom in the intersection type system.

Assume that π is the proof

$$\frac{A\_1^{u\_1}, \dots, A\_n^{u\_n}, A^u \vdash B}{A\_1^{u\_1}, \dots, A\_n^{u\_n} \vdash A \Rightarrow\_u B}$$

where <sup>π</sup><sup>0</sup> is the proof of the premise of the last rule of π. By inductive hypothesis the <sup>λ</sup>-term <sup>π</sup><sup>0</sup>−→x ,x satisfies (x<sup>i</sup> : <sup>A</sup><sup>u</sup><sup>i</sup> <sup>i</sup> <sup>j</sup> : <sup>A</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1, x : <sup>A</sup><sup>u</sup> <sup>j</sup> : <sup>A</sup> <sup>π</sup><sup>0</sup>−→x ,x : <sup>B</sup> <sup>j</sup> : <sup>B</sup> from which we deduce (x<sup>i</sup> : <sup>A</sup><sup>u</sup><sup>i</sup> <sup>i</sup> <sup>j</sup> : <sup>A</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1 λx<sup>A</sup> <sup>π</sup><sup>0</sup>−→x ,x : (<sup>A</sup><sup>u</sup> <sup>j</sup> ,<sup>B</sup> <sup>j</sup> ) : <sup>A</sup> <sup>⇒</sup> <sup>B</sup> which is the required judgment since <sup>π</sup>−→<sup>x</sup> <sup>=</sup> λx<sup>A</sup> <sup>π</sup><sup>0</sup>−→x ,x and (<sup>A</sup><sup>u</sup><sup>i</sup> <sup>i</sup> <sup>j</sup> ,<sup>B</sup> <sup>j</sup> ) = <sup>A</sup> <sup>⇒</sup><sup>u</sup> <sup>B</sup> <sup>j</sup> as easily checked.

Assume last that π ends with

$$\frac{\pi^1}{C\_1^{u\_1}, \dots, C\_n^{u\_n} \vdash A \Rightarrow\_u B} \quad \begin{array}{c} \pi^2 \\ D\_1^{v\_1}, \dots, D\_n^{v\_1} \vdash A \\ \hline E\_1^{w\_1}, \dots, E\_n^{w\_n} \vdash B \end{array}$$

with: for each <sup>i</sup> = 1,...,n there are two disjoint sets <sup>L</sup><sup>i</sup> and <sup>R</sup><sup>i</sup> such that <sup>d</sup>(E<sup>i</sup>) = <sup>L</sup><sup>i</sup> <sup>+</sup> <sup>R</sup><sup>i</sup>, <sup>C</sup><sup>i</sup> <sup>=</sup> <sup>E</sup><sup>i</sup>-<sup>L</sup><sup>i</sup> , <sup>D</sup><sup>i</sup> <sup>=</sup> <sup>E</sup><sup>i</sup>-<sup>R</sup><sup>i</sup> , <sup>w</sup><sup>i</sup> -<sup>L</sup>i<sup>=</sup> <sup>u</sup><sup>i</sup>, and <sup>w</sup><sup>i</sup> -<sup>R</sup>i<sup>=</sup> u ◦ v<sup>i</sup>.

Let <sup>j</sup> <sup>∈</sup> <sup>d</sup>(B). By inductive hypothesis, the judgment (x<sup>i</sup> : <sup>C</sup><sup>u</sup><sup>i</sup> <sup>i</sup> <sup>j</sup> : <sup>C</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1 <sup>π</sup><sup>1</sup>−→<sup>x</sup> : <sup>A</sup> <sup>⇒</sup><sup>u</sup> <sup>B</sup> <sup>j</sup> : <sup>A</sup> <sup>⇒</sup> <sup>B</sup> is derivable in the intersection type system. Let <sup>K</sup><sup>j</sup> <sup>=</sup> u<sup>−</sup><sup>1</sup>({j}), which is a finite subset of <sup>d</sup>(A). By inductive hypothesis again, for each <sup>k</sup> <sup>∈</sup> <sup>K</sup><sup>j</sup> we have (x<sup>i</sup> : <sup>D</sup>u<sup>i</sup> <sup>i</sup> <sup>k</sup> : <sup>D</sup>i)<sup>n</sup> <sup>i</sup>=1 <sup>π</sup>2−→<sup>x</sup> : <sup>A</sup> <sup>k</sup> : <sup>A</sup> . Now observe that <sup>A</sup> <sup>⇒</sup><sup>u</sup> <sup>B</sup> <sup>j</sup> = ([<sup>A</sup> <sup>k</sup> <sup>|</sup> <sup>k</sup> <sup>∈</sup> <sup>K</sup><sup>j</sup> ],<sup>B</sup> <sup>j</sup> ) so that

$$\{x\_i : \langle C\_i^{u\_i} \rangle\_j + \sum\_{k \in K\_j} \langle D\_i^{u\_i} \rangle\_k : \underline{E\_i} \rangle\_{i=1}^n \vdash \left(\underline{\pi}^1\_{\underline{x}} \right) \underline{\pi}^2\_{\underline{x}} : \langle B \rangle\_j : \underline{B} \rangle$$

is derivable in intersection types (remember that <sup>C</sup><sup>i</sup> <sup>=</sup> <sup>D</sup><sup>i</sup> <sup>=</sup> <sup>E</sup>i). Since <sup>π</sup>−→<sup>x</sup> <sup>=</sup> π1−→x <sup>π</sup>2−→<sup>x</sup> it will be sufficient to prove that

$$
\langle E\_i^{w\_i} \rangle\_j = \langle C\_i^{u\_i} \rangle\_j + \sum\_{k \in K\_j} \langle D\_i^{v\_i} \rangle\_k \,. \tag{2}
$$

For this, since E<sup>w</sup><sup>i</sup> <sup>i</sup> <sup>j</sup> = [<sup>E</sup><sup>i</sup> <sup>l</sup> <sup>|</sup> <sup>w</sup><sup>i</sup>(l) = <sup>j</sup> ], consider an element <sup>l</sup> of <sup>d</sup>(E<sup>i</sup>) such that <sup>w</sup><sup>i</sup>(l) = <sup>j</sup>. There are two possibilities: (1) either <sup>l</sup> <sup>∈</sup> <sup>L</sup><sup>i</sup> and in that case we know that <sup>E</sup><sup>i</sup> <sup>l</sup> <sup>=</sup> <sup>C</sup><sup>i</sup> <sup>l</sup> since <sup>E</sup><sup>i</sup>-<sup>L</sup><sup>i</sup> <sup>=</sup> <sup>C</sup><sup>i</sup> and moreover we have <sup>u</sup><sup>i</sup>(l) = <sup>w</sup><sup>i</sup>(l) = <sup>j</sup> (2) or <sup>l</sup> <sup>∈</sup> <sup>R</sup><sup>i</sup>. In that case we have <sup>E</sup><sup>i</sup> <sup>l</sup> <sup>=</sup> <sup>D</sup><sup>i</sup> <sup>l</sup> since <sup>E</sup><sup>i</sup>-<sup>R</sup><sup>i</sup> <sup>=</sup> <sup>D</sup><sup>i</sup>. Moreover <sup>u</sup>(v<sup>i</sup>(l)) = <sup>w</sup><sup>i</sup>(l) = <sup>j</sup> and hence <sup>v</sup><sup>i</sup>(l) <sup>∈</sup> <sup>K</sup><sup>j</sup> . Therefore

$$\begin{aligned} \left[ \langle E\_i \rangle\_l \mid l \in L\_i \text{ and } w\_i(l) = j \right] &= \left[ \langle C\_i \rangle\_l \mid u\_i(l) = j \right] = \langle C\_i^{u\_i} \rangle\_j \\\ \left[ \langle E\_i \rangle\_l \mid l \in R\_i \text{ and } w\_i(l) = j \right] &= \left[ \langle D\_i \rangle\_l \mid v\_i(l) \in K\_j \right] = \sum\_{k \in K\_j} \langle D\_i^{v\_i} \rangle\_k \end{aligned}$$

and (2) follows. ✷

Theorem 3 (Completeness). Let <sup>J</sup> <sup>⊆</sup> <sup>I</sup>. Let <sup>M</sup> be a <sup>λ</sup>-term and <sup>x</sup>1,...,x<sup>n</sup> be pairwise distinct variables, such that (x<sup>i</sup> : <sup>m</sup><sup>j</sup> <sup>i</sup> : <sup>σ</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1 <sup>M</sup> : <sup>b</sup><sup>j</sup> : <sup>τ</sup> in the intersection type system for all <sup>j</sup> <sup>∈</sup> <sup>J</sup>. Let <sup>A</sup>1,...,A<sup>n</sup> and <sup>B</sup> be formulas and let <sup>u</sup>1,...,u<sup>n</sup> be almost injective functions such that <sup>u</sup><sup>i</sup> : <sup>d</sup>(A<sup>i</sup>) <sup>→</sup> <sup>J</sup> <sup>=</sup> <sup>d</sup>(B). Assume also that <sup>A</sup><sup>i</sup> <sup>=</sup> <sup>σ</sup><sup>i</sup> for each <sup>i</sup> = 1,...,n and that <sup>B</sup> <sup>=</sup> <sup>τ</sup> . Last assume that, for all <sup>j</sup> <sup>∈</sup> <sup>J</sup>, one has <sup>B</sup> <sup>j</sup> <sup>=</sup> <sup>b</sup><sup>j</sup> and <sup>A</sup><sup>u</sup><sup>i</sup> <sup>i</sup> <sup>j</sup> <sup>=</sup> <sup>m</sup><sup>j</sup> <sup>i</sup> for <sup>i</sup> = 1,...,n. Then the judgment (A<sup>u</sup><sup>i</sup> <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>B</sup> has a proof <sup>π</sup> such that <sup>π</sup>−→<sup>x</sup> <sup>∼</sup><sup>η</sup> <sup>M</sup>.

Proof. By induction on <sup>M</sup>. Assume first that <sup>M</sup> <sup>=</sup> <sup>x</sup><sup>i</sup> for some <sup>i</sup> ∈ {1,...,n}. Then we must have τ <sup>=</sup> σ<sup>i</sup>, <sup>m</sup><sup>j</sup> <sup>q</sup> =[] for <sup>q</sup> <sup>=</sup> <sup>i</sup> and <sup>m</sup><sup>j</sup> <sup>i</sup> = [ <sup>b</sup><sup>j</sup> ] for all <sup>j</sup> <sup>∈</sup> <sup>J</sup>. Therefore <sup>d</sup>(A<sup>q</sup>) = <sup>∅</sup> and <sup>u</sup><sup>q</sup> is the empty function for <sup>q</sup> <sup>=</sup> <sup>i</sup>, <sup>u</sup><sup>i</sup> is a bijection <sup>d</sup>(A<sup>i</sup>) <sup>→</sup> <sup>J</sup> and <sup>∀</sup><sup>k</sup> <sup>∈</sup> <sup>d</sup>(A<sup>i</sup>) <sup>A</sup><sup>i</sup> <sup>k</sup> <sup>=</sup> <sup>b</sup><sup>u</sup>i(k), in other words <sup>u</sup><sup>i</sup>∗(A<sup>i</sup>) <sup>∼</sup> <sup>B</sup>. By Theorem <sup>1</sup> we know that the judgment (u<sup>i</sup>∗(A<sup>i</sup>))Id <sup>B</sup> is provable in LJ(I) with a proof <sup>ρ</sup> such that <sup>ρ</sup><sup>x</sup> <sup>∼</sup><sup>η</sup> <sup>x</sup>. We have a proof <sup>θ</sup> of (A<sup>u</sup><sup>i</sup> <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>u</sup><sup>i</sup>∗(A<sup>i</sup>) which consists of an axiom so that <sup>θ</sup>−→<sup>x</sup> <sup>=</sup> <sup>x</sup><sup>i</sup> and hence by Lemma <sup>6</sup> we have a proof <sup>π</sup> of (A<sup>u</sup><sup>i</sup> <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>B</sup> such that <sup>π</sup>−→<sup>x</sup> <sup>=</sup> <sup>ρ</sup><sup>x</sup> [θ−→<sup>x</sup> /x] <sup>∼</sup><sup>η</sup> <sup>x</sup><sup>i</sup>.

Assume that M <sup>=</sup> λx<sup>σ</sup> <sup>N</sup>, that <sup>τ</sup> = (<sup>σ</sup> <sup>⇒</sup> <sup>ϕ</sup>) and that we have a family of deductions (for <sup>j</sup> <sup>∈</sup> <sup>J</sup>) of (x<sup>i</sup> : <sup>m</sup><sup>j</sup> <sup>i</sup> : <sup>σ</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1 <sup>M</sup> : (m<sup>j</sup> , c<sup>j</sup> ) : <sup>σ</sup> <sup>⇒</sup> <sup>ϕ</sup> with <sup>b</sup><sup>j</sup> = (m<sup>j</sup> , c<sup>j</sup> ) and the premise of this conclusion in each of these deductions is (x<sup>i</sup> : <sup>m</sup><sup>j</sup> <sup>i</sup> : <sup>σ</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1, x : <sup>m</sup><sup>j</sup> : <sup>σ</sup> <sup>N</sup> : <sup>c</sup><sup>j</sup> : <sup>ϕ</sup>. We must have <sup>B</sup> = (<sup>C</sup> <sup>⇒</sup><sup>u</sup> <sup>D</sup>) with <sup>D</sup> <sup>=</sup> <sup>ϕ</sup>, <sup>C</sup> <sup>=</sup> <sup>σ</sup>, <sup>d</sup>(D) = <sup>J</sup>, <sup>u</sup> : <sup>d</sup>(C) <sup>→</sup> <sup>d</sup>(D) almost injective, <sup>D</sup> <sup>j</sup> <sup>=</sup> <sup>c</sup><sup>j</sup> and

[<sup>C</sup> <sup>k</sup> <sup>|</sup> <sup>k</sup> <sup>∈</sup> <sup>d</sup>(C) and <sup>u</sup>(k) = <sup>j</sup> ] = <sup>m</sup><sup>j</sup> , that is <sup>C</sup><sup>u</sup> <sup>j</sup> <sup>=</sup> <sup>m</sup><sup>j</sup> , for each <sup>j</sup> <sup>∈</sup> <sup>J</sup>. By inductive hypothesis we have a proof ρ of (Au<sup>i</sup> <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1, C<sup>u</sup> D such that <sup>ρ</sup>−→x ,x <sup>∼</sup><sup>η</sup> <sup>N</sup> from which we obtain a proof <sup>π</sup> of (Au<sup>i</sup> <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>C</sup> <sup>⇒</sup><sup>u</sup> <sup>D</sup> such that <sup>π</sup>−→<sup>x</sup> <sup>=</sup> λx<sup>σ</sup> <sup>ρ</sup>−→x ,x <sup>∼</sup><sup>η</sup> <sup>M</sup> as expected.

Assume last that M = (N) P and that we have a J-indexed family of deductions (x<sup>i</sup> : <sup>m</sup><sup>j</sup> <sup>i</sup> : <sup>σ</sup>i)<sup>n</sup> <sup>i</sup>=1 <sup>M</sup> : <sup>b</sup><sup>j</sup> : τ . Let <sup>A</sup>1,...,An, <sup>u</sup>1,...,u<sup>n</sup> and <sup>B</sup> be LJ(I) formulas and almost injective functions as in the statement of the theorem.

Let <sup>j</sup> <sup>∈</sup> <sup>J</sup>. There is a finite set <sup>L</sup><sup>j</sup> <sup>⊆</sup> <sup>I</sup> and multisets <sup>m</sup>j,<sup>0</sup> <sup>i</sup> , (mj,l <sup>i</sup> )l∈L<sup>j</sup> such that we have deductions<sup>7</sup> of (x<sup>i</sup> : <sup>m</sup>j,<sup>0</sup> <sup>i</sup> : <sup>σ</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1 <sup>N</sup> : ([ <sup>a</sup><sup>j</sup> <sup>l</sup> <sup>|</sup> <sup>l</sup> <sup>∈</sup> <sup>L</sup><sup>j</sup> ], b<sup>j</sup> ) : <sup>σ</sup> <sup>⇒</sup> <sup>τ</sup> and, for each <sup>l</sup> <sup>∈</sup> <sup>L</sup><sup>j</sup> , of (x<sup>i</sup> : <sup>m</sup>j,l <sup>i</sup> : <sup>σ</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1 <sup>P</sup> : <sup>a</sup><sup>j</sup> <sup>l</sup> : <sup>σ</sup> with

$$m\_i^j = m\_i^{j,0} + \sum\_{l \in L\_j} m\_i^{j,l} \,. \tag{3}$$

We assume the finite sets <sup>L</sup><sup>j</sup> to be pairwise disjoint (this is possible because <sup>I</sup> is infinite) and we use L for their union. Let u : L <sup>→</sup> J be the function which maps <sup>l</sup> <sup>∈</sup> <sup>L</sup> to the unique <sup>j</sup> such that <sup>l</sup> <sup>∈</sup> <sup>L</sup><sup>j</sup> , this function is almost injective. Let <sup>A</sup> be an LL(J) formula such that <sup>A</sup> <sup>=</sup> <sup>σ</sup>, <sup>d</sup>(A) = <sup>L</sup> and <sup>A</sup> <sup>l</sup> <sup>=</sup> <sup>a</sup><sup>u</sup>(l) <sup>l</sup> ; such a formula exists by Proposition 1.

Let i ∈ {1,...,n}. For each j <sup>∈</sup> J we know that

$$\mathbb{E}\left[\langle A\_i \rangle\_r \mid r \in \mathfrak{d}(A\_i) \text{ and } u\_i(r) = j \right] = m\_i^j = m\_i^{j,0} + \sum\_{l \in L\_j} m\_i^{j,l}$$

and hence we can split the set <sup>d</sup>(A<sup>i</sup>) <sup>∩</sup> <sup>u</sup><sup>i</sup> <sup>−</sup>1({j}) into disjoint subsets Rj,<sup>0</sup> <sup>i</sup> and (Rj,l <sup>i</sup> )<sup>l</sup>∈L<sup>j</sup> in such a way that

$$\mathbb{E}\left[\langle A\_i \rangle\_r \mid r \in R\_i^{j,0} \right] = m\_i^{j,0} \quad \text{and} \quad \forall l \in L\_j \; [\, \langle A\_i \rangle\_r \mid r \in R\_i^{j,l} \;] = m\_i^{j,l} \;.$$

We set R<sup>0</sup> <sup>i</sup> = <sup>j</sup>∈<sup>J</sup> <sup>R</sup>j,<sup>0</sup> <sup>i</sup> ; observe that this is a disjoint union because <sup>R</sup>j,<sup>0</sup> <sup>i</sup> ⊆ ui <sup>−</sup><sup>1</sup>({j}). Similarly we define R<sup>1</sup> <sup>i</sup> = <sup>l</sup>∈<sup>L</sup> <sup>R</sup><sup>u</sup>(l),l <sup>i</sup> which is a disjoint union for the following reason: if l,l <sup>∈</sup> L satisfy u(l) = u(l ) = j then Rj,l <sup>i</sup> and <sup>R</sup>j,l- i have been chosen disjoint and if u(l) = j and u(l ) = j with j <sup>=</sup> j we have Rj,l <sup>i</sup> <sup>⊆</sup> <sup>u</sup><sup>i</sup> <sup>−</sup><sup>1</sup>{j} and R<sup>j</sup>- ,l- <sup>i</sup> <sup>⊆</sup> <sup>u</sup><sup>i</sup> <sup>−</sup><sup>1</sup>({j }). Let <sup>v</sup><sup>i</sup> : <sup>R</sup><sup>1</sup> <sup>i</sup> <sup>→</sup> <sup>L</sup> be defined by: <sup>v</sup><sup>i</sup>(r) is the unique l <sup>∈</sup> L such that r <sup>∈</sup> R<sup>u</sup>(l),l <sup>i</sup> . Since each <sup>R</sup>j,l <sup>i</sup> is finite the function <sup>v</sup><sup>i</sup> is almost injective. Moreover <sup>u</sup> ◦ <sup>v</sup><sup>i</sup> <sup>=</sup> <sup>u</sup><sup>i</sup>-R<sup>1</sup> i .

We use u <sup>i</sup> for the restriction of <sup>u</sup><sup>i</sup> to <sup>R</sup><sup>0</sup> <sup>i</sup> so that <sup>u</sup> <sup>i</sup> : <sup>R</sup><sup>0</sup> <sup>i</sup> <sup>→</sup> <sup>J</sup>. By inductive hypothesis we have ((A<sup>i</sup>-R<sup>0</sup> i )u- <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>A</sup> <sup>⇒</sup><sup>u</sup> <sup>B</sup> with a proof <sup>μ</sup> such that <sup>μ</sup>−→<sup>x</sup> <sup>∼</sup><sup>η</sup> <sup>N</sup>. Indeed [<sup>A</sup><sup>i</sup>-R<sup>0</sup> i <sup>r</sup> <sup>|</sup> <sup>r</sup> <sup>∈</sup> <sup>R</sup><sup>0</sup> <sup>i</sup> and <sup>u</sup> <sup>i</sup>(r) = j ] = mj,<sup>0</sup> <sup>i</sup> and <sup>A</sup> <sup>⇒</sup><sup>u</sup> <sup>B</sup> <sup>j</sup> <sup>=</sup> ([ aj <sup>l</sup> <sup>|</sup> <sup>u</sup>(l) = <sup>j</sup> ], b<sup>j</sup> ) for each <sup>j</sup> <sup>∈</sup> <sup>J</sup>. For the same reason we have ((A<sup>i</sup>-R<sup>1</sup> i )<sup>v</sup><sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>A</sup> with a proof <sup>ρ</sup> such that <sup>ρ</sup>−→<sup>x</sup> <sup>∼</sup><sup>η</sup> <sup>P</sup>. Indeed for each <sup>l</sup> <sup>∈</sup> <sup>L</sup> <sup>=</sup> <sup>d</sup>(A) we have

<sup>7</sup> Notice that our λ-calculus is in *Church style* and hence the type σ is uniquely determined by the sub-term N of M.

[Ai-R<sup>1</sup> i <sup>r</sup> <sup>|</sup> <sup>v</sup>i(r) = <sup>l</sup> ] = <sup>m</sup>j,l <sup>i</sup> and <sup>A</sup> <sup>l</sup> <sup>=</sup> <sup>a</sup><sup>j</sup> <sup>l</sup> where <sup>j</sup> <sup>=</sup> <sup>u</sup>(l). By an application rule we get a proof π of (Au<sup>i</sup> <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>B</sup> such that <sup>π</sup>−→<sup>x</sup> <sup>=</sup> μ−→<sup>x</sup> <sup>ρ</sup>−→<sup>x</sup> <sup>∼</sup><sup>η</sup> (N) <sup>P</sup> <sup>=</sup> <sup>M</sup> as contended. ✷

## 4 The untyped Scott case

Since intersection types usually apply to the pure λ-calculus, we move now to this setting by choosing in **Rel**! the set <sup>R</sup><sup>∞</sup> as model of the pure <sup>λ</sup>-calculus. The R<sup>∞</sup> intersection typing system has the elements of R<sup>∞</sup> as types, and the typing rules involve sequents of shape (x<sup>i</sup> : <sup>m</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1 <sup>M</sup> : <sup>a</sup> where <sup>m</sup><sup>i</sup> ∈ Mfin(R∞) and a <sup>∈</sup> <sup>R</sup>∞.

We use <sup>Λ</sup> for the set of terms of the pure <sup>λ</sup>-calculus, and <sup>Λ</sup><sup>Ω</sup> as the pure <sup>λ</sup>calculus extended with a constant <sup>Ω</sup> subject to the two following ❀<sup>ω</sup> reduction rules: λx Ω ❀<sup>ω</sup> <sup>Ω</sup> and (Ω) <sup>M</sup> ❀<sup>ω</sup> <sup>Ω</sup>. We use <sup>∼</sup>ηω for the least congruence on <sup>Λ</sup><sup>Ω</sup> which contains ❀<sup>η</sup> and ❀<sup>ω</sup> and similarly for <sup>∼</sup>βηω. We define a family (H(x))<sup>x</sup>∈V of subsets of <sup>Λ</sup><sup>Ω</sup> minimal such that, for any sequence −→<sup>x</sup> = (x1,...,x<sup>n</sup>) and −→<sup>y</sup> <sup>=</sup> (y1,...,y<sup>k</sup>) such that −→x , −→<sup>y</sup> is repetition-free, and for any terms <sup>M</sup><sup>i</sup> ∈ H(x<sup>i</sup>) (for <sup>i</sup> = 1,...,n), one has <sup>λ</sup>−→x λ−→<sup>y</sup> (x) <sup>M</sup><sup>1</sup> ··· <sup>M</sup><sup>n</sup> <sup>O</sup><sup>1</sup> ··· <sup>O</sup><sup>l</sup> ∈ H(x) where <sup>O</sup><sup>j</sup> <sup>∼</sup><sup>ω</sup> <sup>Ω</sup> for j = 1,...,l. Notice that x ∈ H(x).

The typing rules of R<sup>∞</sup> are

$$\begin{array}{c} \begin{array}{l} \begin{array}{l} \begin{array}{l} \left( \begin{array}{c} \left[ \end{array} \right], \ldots, x\_{i}: \left[ \left[ a \right], \ldots, x\_{n}: \left[ \right] \right] \vdash x\_{i}: a \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \Phi, x: m \vdash M : a\\ \begin{array}{l} \Phi \vdash \lambda x \, M : \left( m, a \right) \end{array} \end{array} \end{array} \end{array}$$
 
$$\begin{array}{l} \Phi \vdash M : \left( \left[ \begin{array}{l} a\_{1}, \ldots, a\_{k} \end{array} \right], b \right) \qquad \left( \Phi\_{j} \vdash N : a\_{j} \right)\_{j=1}^{k} \end{array} \right)$$
 
$$\begin{array}{l} \Phi + \sum\_{j=1}^{k} \Phi\_{j} \vdash \left( M \right) N : b \end{array}$$

where we use the following convention: when we write Φ <sup>+</sup> Ψ it is assumed that <sup>Φ</sup> is of shape (x<sup>i</sup> : <sup>m</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1 and <sup>Ψ</sup> is of shape (x<sup>i</sup> : <sup>p</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1, and then Φ <sup>+</sup> Ψ is (x<sup>i</sup> : <sup>m</sup><sup>i</sup> <sup>+</sup>p<sup>i</sup>)<sup>n</sup> <sup>i</sup>=1. This typing system is just a "proof-theoretic" rephrasing of the denotational semantics of the terms of <sup>Λ</sup><sup>Ω</sup> in <sup>R</sup>∞.

Proposition 2. Let M,M <sup>∈</sup> <sup>Λ</sup><sup>Ω</sup> and −→<sup>x</sup> = (x<sup>1</sup>,...,x<sup>n</sup>) be a list of pairwise distinct variables containing all the free variables of M and M . Let <sup>m</sup><sup>i</sup> ∈ Mfin(R∞) for <sup>i</sup> = 1,...,n and <sup>b</sup> <sup>∈</sup> <sup>R</sup>∞. If <sup>M</sup> <sup>∼</sup>βηω <sup>M</sup> then (x<sup>i</sup> : <sup>m</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1 <sup>M</sup> : <sup>b</sup> iff (x<sup>i</sup> : <sup>m</sup><sup>i</sup>)<sup>n</sup> <sup>i</sup>=1 <sup>M</sup> : <sup>b</sup>.

#### 4.1 Formulas

We define the associated formulas as follows, each formula A being given together with <sup>d</sup>(A) <sup>⊆</sup> <sup>I</sup> and <sup>A</sup><sup>∈</sup> <sup>R</sup><sup>d</sup>(A) <sup>∞</sup> .


We can consider that there is a type <sup>o</sup> of pure <sup>λ</sup>-terms interpreted as <sup>R</sup><sup>∞</sup> in **Rel**!, such that (<sup>o</sup> <sup>⇒</sup> <sup>o</sup>) = <sup>o</sup>, and then for any formula A we have A <sup>=</sup> <sup>o</sup>.

Operations of restriction and relocation of formulas are the same as in Section <sup>3</sup> (setting <sup>ε</sup><sup>J</sup> -<sup>K</sup> <sup>=</sup> <sup>ε</sup>J∩K) and satisfy the same properties, for instance A-<sup>K</sup> <sup>=</sup> A -<sup>K</sup> and one sets <sup>u</sup>∗(ε<sup>J</sup> ) = <sup>ε</sup><sup>K</sup> if <sup>u</sup> : <sup>J</sup> <sup>→</sup> <sup>K</sup> is a bijection.

The deduction rules are exactly the same as those of Section 3, plus the axiom ε∅. With any deduction <sup>π</sup> of (Au<sup>i</sup> <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>B</sup> and sequence −→<sup>x</sup> = (x1,...,xn) of pairwise distinct variables, we can associate a pure <sup>π</sup>−→<sup>x</sup> <sup>∈</sup> <sup>Λ</sup><sup>Ω</sup> defined exactly as in Section <sup>3</sup> (just drop the types associated with variables in abstractions). If π consists of an instance of the additional axiom, we set <sup>π</sup>−→<sup>x</sup> <sup>=</sup> <sup>Ω</sup>.

Lemma 7. Let A, A1,...,A<sup>n</sup> be a formula such that <sup>d</sup>(A) = <sup>d</sup>(A<sup>i</sup>) = <sup>∅</sup>. Then (A0∅ <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>A</sup> is provable by a proof <sup>π</sup> which satisfies <sup>π</sup><sup>x</sup>1,...,x<sup>k</sup> <sup>∼</sup><sup>ω</sup> <sup>Ω</sup>.

The proof is a straightforward induction on A using the additional axiom, Lemma <sup>1</sup> and the observations that if <sup>d</sup>(<sup>B</sup> <sup>⇒</sup><sup>u</sup> <sup>C</sup>) = <sup>∅</sup> then <sup>u</sup> = 0∅.

One can easily define a size function sz : <sup>R</sup><sup>∞</sup> <sup>→</sup> <sup>N</sup> such that sz(e)=0 and sz([ <sup>a</sup>1,...,a<sup>k</sup> ], a) = sz(a) +k <sup>i</sup>=1(1 +sz(a<sup>i</sup>)). First we have to prove an adapted version of Proposition 1; here it will be restricted to finite sets.

Proposition 3. Let J be a finite subset of I and f <sup>∈</sup> <sup>R</sup><sup>J</sup> <sup>∞</sup>. There is a formula A such that <sup>d</sup>(A) = J and A <sup>=</sup> f.

Proof. Observe that, since J is finite, there is an N <sup>∈</sup> <sup>N</sup> such that <sup>∀</sup>j <sup>∈</sup> J <sup>∀</sup>q <sup>∈</sup> <sup>N</sup> <sup>q</sup> <sup>≥</sup> <sup>N</sup> <sup>⇒</sup> <sup>f</sup>(j)<sup>q</sup> =[] (remember that <sup>f</sup>(j) ∈ Mfin(R∞)<sup>N</sup>). Let <sup>N</sup>(f) be the least such N. We set sz(f) = - <sup>j</sup>∈<sup>J</sup> sz(f(j)) and the proof is by induction on (sz(f), N(f)) lexicographically.

If sz(f)=0 this means that f(j) = <sup>e</sup> for all j <sup>∈</sup> J and hence we can take <sup>A</sup> <sup>=</sup> <sup>ε</sup><sup>J</sup> . Assume that sz(f) <sup>&</sup>gt; <sup>0</sup>, one can write<sup>8</sup> <sup>f</sup>(j)=(m<sup>j</sup> , a<sup>j</sup> ) with <sup>m</sup><sup>j</sup> ∈ Mfin(R∞) and <sup>a</sup><sup>j</sup> <sup>∈</sup> <sup>R</sup><sup>∞</sup> for each <sup>j</sup> <sup>∈</sup> <sup>J</sup>. Just as in the proof of Proposition <sup>1</sup> we choose a set <sup>K</sup>, a function <sup>g</sup> : <sup>K</sup> <sup>→</sup> <sup>R</sup><sup>∞</sup> and an almost injective function <sup>u</sup> : <sup>K</sup> <sup>→</sup> <sup>J</sup> such that <sup>m</sup><sup>j</sup> = [ <sup>g</sup>(k) <sup>|</sup> <sup>u</sup>(k) = <sup>j</sup> ]. The set <sup>K</sup> is finite since <sup>J</sup> is and we have sz(g) < sz(f) because sz(f) > <sup>0</sup>. Therefore by inductive hypothesis there is a formula <sup>B</sup> such that <sup>d</sup>(B) = <sup>K</sup> and <sup>B</sup> <sup>=</sup> <sup>g</sup>. Let <sup>f</sup> : <sup>J</sup> <sup>→</sup> <sup>R</sup><sup>∞</sup> defined by f (j) = <sup>a</sup><sup>j</sup> , we have sz(f ) <sup>≤</sup> sz(f) and N(f ) < N(f) and hence by inductive hypothesis there is a formula <sup>C</sup> such that <sup>C</sup> <sup>=</sup> <sup>f</sup>. We set <sup>A</sup> = (<sup>B</sup> <sup>⇒</sup><sup>u</sup> <sup>C</sup>) which satisfies A <sup>=</sup> f as required. ✷

Theorem <sup>1</sup> still holds up to some mild adaptation. First notice that A <sup>∼</sup> B simply means now that <sup>d</sup>(A) = <sup>d</sup>(B) and A <sup>=</sup> B .

Theorem 4. If A and B are such that A <sup>∼</sup> B then AId <sup>B</sup> with a proof <sup>π</sup> which satisfies <sup>π</sup><sup>x</sup> ∈ H(x).

<sup>8</sup> This is also possible if sz(f)=0 actually.

Proof. By induction on the sum of the sizes of <sup>A</sup> and <sup>B</sup>. Assume that <sup>A</sup> <sup>=</sup> <sup>ε</sup><sup>J</sup> so that <sup>d</sup>(B) = <sup>J</sup> and <sup>∀</sup><sup>j</sup> <sup>∈</sup> <sup>J</sup> <sup>B</sup> <sup>j</sup> <sup>=</sup> <sup>e</sup>. There are two cases as to <sup>B</sup>. In the first case <sup>B</sup> is of shape <sup>ε</sup><sup>K</sup> but then we must have <sup>K</sup> <sup>=</sup> <sup>J</sup> and we can take for <sup>π</sup> an axiom so that <sup>π</sup><sup>x</sup> <sup>=</sup> <sup>x</sup> ∈ H(x). Otherwise we have <sup>B</sup> = (<sup>C</sup> <sup>⇒</sup><sup>u</sup> <sup>D</sup>) with <sup>d</sup>(D) = <sup>J</sup>, <sup>∀</sup><sup>j</sup> <sup>∈</sup> <sup>J</sup> <sup>D</sup> <sup>j</sup> <sup>=</sup> <sup>e</sup> and <sup>d</sup>(C) = <sup>∅</sup>, so that <sup>u</sup> = 0<sup>J</sup> . We have <sup>A</sup> <sup>∼</sup> <sup>D</sup> and hence by inductive hypothesis we have a proof <sup>ρ</sup> of <sup>A</sup>Id <sup>D</sup> such that <sup>ρ</sup><sup>x</sup> ∈ H(x). By weakening and <sup>⇒</sup>-introduction we get a proof <sup>π</sup> of <sup>A</sup>Id B which satisfies <sup>π</sup><sup>x</sup> <sup>=</sup> λy ρ<sup>x</sup> ∈ H(x).

Assume that <sup>A</sup> = (<sup>C</sup> <sup>⇒</sup><sup>u</sup> <sup>D</sup>). If <sup>B</sup> <sup>=</sup> <sup>ε</sup><sup>J</sup> then we must have <sup>d</sup>(C) = <sup>∅</sup>, <sup>u</sup> = 0<sup>J</sup> and D <sup>∼</sup> B and hence by inductive hypothesis we have a proof ρ of DId B such that <sup>ρ</sup><sup>x</sup> ∈ H(x). By Lemma <sup>7</sup> there is a proof <sup>θ</sup> of <sup>C</sup> such that <sup>θ</sup> <sup>∼</sup><sup>ω</sup> <sup>Ω</sup>. Hence there is a proof <sup>π</sup> of <sup>A</sup>Id <sup>B</sup> such that <sup>π</sup><sup>x</sup> <sup>=</sup> <sup>ρ</sup><sup>y</sup> [(x) θ/y] ∈ H(x).

Assume last that <sup>B</sup> = (<sup>E</sup> <sup>⇒</sup><sup>v</sup> <sup>F</sup>), then we must have <sup>D</sup> <sup>∼</sup> <sup>F</sup> and there must be a bijection w : <sup>d</sup>(E) <sup>→</sup> <sup>d</sup>(C) such that u ◦ w <sup>=</sup> v and w<sup>∗</sup>(E) <sup>∼</sup> <sup>C</sup>. We reason as in the proof of Lemma 1: by inductive hypothesis we have a proof <sup>ρ</sup> of <sup>D</sup>Id F and a proof μ of w<sup>∗</sup>(E)Id <sup>C</sup> from which we build a proof <sup>π</sup> of <sup>A</sup>Id <sup>B</sup> such that <sup>π</sup><sup>x</sup> <sup>=</sup> λy ρ<sup>z</sup> (x) μ<sup>y</sup>/z ∈ H(x) by inductive hypothesis. ✷

Theorem 5 (Soundness). Let π be a deduction tree of A<sup>u</sup><sup>1</sup> <sup>1</sup> ,...,A<sup>u</sup><sup>n</sup> <sup>n</sup> <sup>B</sup> and −→<sup>x</sup> a sequence of <sup>n</sup> pairwise distinct variables. Then the <sup>λ</sup>-term <sup>π</sup>−→<sup>x</sup> <sup>∈</sup> <sup>Λ</sup><sup>Ω</sup> satisfies (x<sup>i</sup> : <sup>A</sup><sup>u</sup><sup>i</sup> <sup>i</sup> <sup>j</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>π</sup>−→<sup>x</sup> : <sup>B</sup> <sup>j</sup> in the <sup>R</sup><sup>∞</sup> intersection type system, for each <sup>j</sup> <sup>∈</sup> <sup>d</sup>(B).

The proof is exactly the same as that of Theorem 2, dropping all simple types.

For all <sup>λ</sup>-term <sup>M</sup> <sup>∈</sup> <sup>Λ</sup>, we define <sup>H</sup>Ω(M) as the least subset of element of <sup>Λ</sup><sup>Ω</sup> such that:


The elements of <sup>H</sup>Ω(M) can probably be seen as approximates of <sup>M</sup>.

Theorem 6 (Completeness). Let <sup>J</sup> <sup>⊆</sup> <sup>I</sup> be finite. Let <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>Ω</sup> and <sup>x</sup><sup>1</sup>,...,x<sup>n</sup> be pairwise distinct variables, such that (x<sup>i</sup> : <sup>m</sup><sup>j</sup> i )n <sup>i</sup>=1 <sup>M</sup> : <sup>b</sup><sup>j</sup> in the <sup>R</sup><sup>∞</sup> intersection type system for all <sup>j</sup> <sup>∈</sup> <sup>J</sup>. Let <sup>A</sup><sup>1</sup>,...,A<sup>n</sup> and <sup>B</sup> be formulas and let <sup>u</sup><sup>1</sup>,...,u<sup>n</sup> be almost injective functions such that <sup>u</sup><sup>i</sup> : <sup>d</sup>(A<sup>i</sup>) <sup>→</sup> <sup>J</sup> <sup>=</sup> <sup>d</sup>(B). Assume also that, for all <sup>j</sup> <sup>∈</sup> <sup>J</sup>, one has <sup>B</sup> <sup>j</sup> <sup>=</sup> <sup>b</sup><sup>j</sup> and <sup>A</sup><sup>u</sup><sup>i</sup> <sup>i</sup> <sup>j</sup> <sup>=</sup> <sup>m</sup><sup>j</sup> <sup>i</sup> for <sup>i</sup> = 1,...,n. Then the judgment A<sup>u</sup><sup>1</sup> <sup>1</sup> ,...,A<sup>u</sup><sup>n</sup> <sup>n</sup> <sup>B</sup> has a proof <sup>π</sup> such that <sup>π</sup>−→<sup>x</sup> ∈ HΩ(M).

The proof is very similar to that of Theorem 3.

## 5 Concluding remarks and acknowledgments

The results presented in this paper show that, at least in non-idempotent intersection types, the problem of knowing whether all elements of a given family of intersection types (a<sup>j</sup> )j∈<sup>J</sup> are inhabited by a common <sup>λ</sup>-term can be reformulated logically: is it true that one (or equivalently, any) of the indexed formulas A such that <sup>d</sup>(A) = <sup>J</sup> and <sup>∀</sup><sup>j</sup> <sup>∈</sup><sup>A</sup> <sup>j</sup> <sup>=</sup> <sup>a</sup><sup>j</sup> is provable in LJ(I)? Such a strong connection between intersection and Indexed Linear Logic was already mentioned in the introduction of [2], but we never made it more explicit until now.

To conclude we propose a typed λ-calculus à la Church to denote proofs of the LJ(I) system of Section 4. The syntax of pre-terms is given by s, t . . . := x[J] <sup>|</sup> λx : A<sup>u</sup> <sup>s</sup> <sup>|</sup> (s)<sup>t</sup> where in <sup>x</sup>[J], <sup>x</sup> is a variable and <sup>J</sup> <sup>⊆</sup> <sup>I</sup> and, in λx : <sup>A</sup><sup>u</sup> <sup>s</sup>, u is an almost injective function from <sup>d</sup>(A) to a set J <sup>⊆</sup> I. Given a pre-term s and a variable x, the domain of x in s is the subset dom(x, s) of I given by dom(x, x[J]) = <sup>J</sup>, dom(x, y[J]) = <sup>∅</sup> if <sup>y</sup> <sup>=</sup> <sup>x</sup>, dom(x, λy : <sup>A</sup><sup>u</sup> s) = dom(x, s) (assuming of course y <sup>=</sup> x) and dom(x,(s)t) = dom(x, s) <sup>∪</sup> dom(x, t). Then a pre-term <sup>s</sup> is a term if any subterm of <sup>t</sup> which is of shape (s1) <sup>s</sup><sup>2</sup> satisfies dom(x, s1)∩dom(x, s2) = <sup>∅</sup> for all variable x. A typing judgment is an expression (x<sup>i</sup> : <sup>A</sup><sup>u</sup><sup>i</sup> <sup>i</sup> )<sup>n</sup> <sup>i</sup>=1 <sup>s</sup> : <sup>B</sup> where the <sup>x</sup><sup>i</sup>'s are pairwise distinct variables, <sup>s</sup> is a term and each <sup>u</sup><sup>i</sup> is an almost injective function <sup>d</sup>(A<sup>i</sup>) <sup>→</sup> <sup>d</sup>(B). The following typing rules exactly mimic the logical rules of LJ(I):

$$\frac{\mathbf{d}(A) = \emptyset}{((x\_i : A\_i^{0\_{\emptyset}})\_{i=1}^n) \vdash \mathcal{Q} : A}$$

$$\begin{array}{ll} \{q \neq i \Rightarrow \mathsf{d}(A\_{i}) = \emptyset \text{ and } u\_{i} \text{ bijection} \\ \hline \\ (x\_{q} : A\_{i}^{u\_{q}})\_{q=1}^{n} \vdash x\_{i} [\mathsf{d}(A\_{i})] : u\_{i \ast}(A\_{i}) \end{array} \quad \begin{array}{ll} (x\_{i} : A\_{i}^{u\_{i}})\_{i=1}^{n}, x : A^{u} \vdash s : B \\ \hline \\ (x\_{i} : A\_{i}^{u\_{i}})\_{i=1}^{n} \vdash \lambda x : A^{u} \, s : A \Rightarrow\_{u} B \\ \hline \\ \end{array}$$
 
$$\begin{array}{ll} (x\_{i} : A\_{i} |\_{\mathsf{d}\mathsf{m}(x\_{i}, s)}^{v\_{i}})\_{i=1}^{n} \vdash s : A \Rightarrow\_{u} B \qquad (x\_{i} : A\_{i} |\_{\mathsf{d}\mathsf{m}(x\_{i}, t)}^{w\_{i}})\_{i=1}^{n} \vdash t : A \\ \hline \\ (x\_{i} : A\_{i}^{v\_{i} + (u \circ w\_{i})})\_{i=1}^{n} \vdash (s) \, t : B \end{array}$$

The properties of this calculus, and more specifically of its β-reduction, and its connections with the resource calculus of [9] will be explored in further work.

Another major objective will be to better understand the meaning of LJ(I) formulas, using ideas developed in [3] where a phase semantics is introduced and related to (non-uniform) coherence space semantics. In the intuitionistic present setting, it is tempting to look for Kripke-like interpretations with the hope of generalizing indexed logic beyond the (perhaps too) specific relational setting we started from.

Last, we would like to thank Luigi Liquori and Claude Stolze for many helpful discussions on intersection types and the referees for their careful reading and insightful comments and suggestions.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **On Computability of Data Word Functions Defined by Transducers***-*

L´eo Exibard1,2--(-) , Emmanuel Filiot1---, and Pierre-Alain Reynier2†

> <sup>1</sup> Universit´e Libre de Bruxelles, Brussels, Belgium leo.exibard@ulb.ac.be

<sup>2</sup> Aix Marseille Univ, Universit´e de Toulon, CNRS, LIS, Marseille, France

**Abstract.** In this paper, we investigate the problem of synthesizing computable functions of infinite words over an infinite alphabet (data ω-words). The notion of computability is defined through Turing machines with infinite inputs which can produce the corresponding infinite outputs in the limit. We use non-deterministic transducers equipped with registers, an extension of register automata with outputs, to specify functions. Such transducers may not define functions but more generally relations of data ω-words, and we show that it is PSpace-complete to test whether a given transducer defines a function. Then, given a function defined by some register transducer, we show that it is decidable (and again, PSpace-c) whether such function is computable. As for the known finite alphabet case, we show that computability and continuity coincide for functions defined by register transducers, and show how to decide continuity. We also define a subclass for which those problems are PTime.

**Keywords:** Data Words · Register Automata · Register Transducers · Functionality · Continuity · Computability.

## **1 Introduction**

Context Program synthesis aims at deriving, in an automatic way, a program that fulfils a given specification. Such setting is very appealing when for instance the specification describes, in some abstract formalism (an automaton or ideally a logic), important properties that the program must satisfy. The synthesised program is then correct-by-construction with regards to those properties. It is particularly important and desirable for the design of safety-critical systems with hard dependability constraints, which are notoriously hard to design correctly.

Program synthesis is hard to realise for general-purpose programming languages but important progress has been made recently in the automatic synthesis

<sup>-</sup>A version with full proofs can be found at https://arxiv.org/abs/2002.08203.

<sup>-</sup>-Funded by a FRIA fellowship from the F.R.S.-FNRS.

<sup>-</sup>-- Research associate of F.R.S.-FNRS. Supported by the ARC Project Transform F´ed´eration Wallonie-Bruxelles and the FNRS CDR J013116F; MIS F451019F projects.

<sup>†</sup> Partly funded by the ANR projects DeLTA (ANR-16-CE40-0007) and Ticktac (ANR-18-CE40-0015).

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 217–236, 2020. https://doi.org/10.1007/978-3-030-45231-5\_12

of reactive systems. In this context, the system continuously receives input signals to which it must react by producing output signals. Such systems are not assumed to terminate and their executions are usually modelled as infinite words over the alphabets of input and output signals. A specification is thus a set of pairs (in,out), where in and out are infinite words, such that out is a legitimate output for in. Most methods for reactive system synthesis only work for synchronous systems over finite sets of input and output signals Σ and Γ. In this synchronous setting, input and output signals alternate, and thus implementations of such a specification are defined by means of synchronous transducers, which are B¨uchi automata with transitions of the form (q, σ, γ, q- ), expressing that in state q, when getting input σ ∈ Σ, output γ ∈ Γ is produced and the machine moves to state q- . We aim at building deterministic implementations, in the sense that the output γ and state q uniquely depend on q and σ. The realisability problem of specifications given as synchronous non-deterministic transducers, by implementations defined by synchronous deterministic transducers is known to be decidable [14,20]. In this paper, we are interested in the asynchronous setting, in which transducers can produce none or several outputs at once every time some input is read, i.e., transitions are of the form (q, σ, w, q- ) where w ∈ Γ∗. However, such generalisation makes the realisability problem undecidable [2,9].

Synthesis of Transducers with Registers In the setting we just described, the set of signals is considered to be finite. This assumption is not realistic in general, as signals may come with unbounded information (e.g. process ids) that we call here data. To address this limitation, recent works have considered the synthesis of reactive systems processing data words [17,6,16,7]. Data words are infinite words over an alphabet Σ × D, where Σ is a finite set and D is a possibly infinite countable set. To handle data words, just as automata have been extended to register automata, transducers have been extended to register transducers. Such transducers are equipped with a finite set of registers in which they can store data and with which they can compare data for equality or inequality. While the realisability problem of specifications given as synchronous non-deterministic register transducers (NRTsyn) by implementation defined by synchronous deterministic register transducers (DRTsyn) is undecidable, decidability is recovered for specifications defined by universal register transducers and by giving as input the number of registers the implementation must have [7,17].

Computable Implementations In the previously mentioned works, both for finite or infinite alphabets, implementations are considered to be deterministic transducers. Such an implementation is guaranteed to use only a constant amount of memory (assuming data have size O(1)). While it makes sense with regards to memoryefficiency, some problems turn out to be undecidable, as already mentioned: realisability of NRTsyn specifications by DRTsyn, or, in the finite alphabet setting, when both the specification and implementation are asynchronous. In this paper, we propose to study computable implementations, in the sense of (partial) functions f of data ω-words computable by some Turing machine M that has an infinite input x ∈ dom(f), and produces longer and longer prefixes of the output

f(x) as it reads longer and longer prefixes of the input x. Therefore, such a machine produces the output f(x) in the limit. We denote by TM the class of Turing machines computing functions in this sense. As an example, consider the function f that takes as input any data ω-word u = (σ1, d1)(σ2, d2)... and outputs (σ1, d1)<sup>ω</sup> if d<sup>1</sup> occurs at least twice in u, and otherwise outputs u. This function is not computable, as an hypothetic machine could not output anything as long as d<sup>1</sup> is not met a second time. However, the following function g is computable. It is defined only on words (σ1, d1)(σ2, d2)... such that σ1σ<sup>2</sup> ···∈ ((a+b)c∗)ω, and transforms any (σi, di) by (σi, d1) if the next symbol in {a, b} is an <sup>a</sup>, otherwise it keeps (σi, di) unchanged. To compute it, a TM would need to store <sup>d</sup>1, and then wait until the next symbol in {a, b} is met before outputting something. Since the finite input labels are necessarily in ((a + b)c∗)ω, this machine will produce the whole output in the limit. Note that g cannot be defined by any deterministic register transducer, as it needs unbounded memory to be implemented.

However, already in the finite alphabet setting, the problem of deciding if a specification given as some non-deterministic synchronous transducer is realisable by some computable function is open. The particular case of realisability by computable functions of universal domain (the set of all ω-words) is known to be decidable [12]. In the asynchronous setting, the undecidability proof of [2] can be easily adapted to show the undecidability of realisability of specifications given by non-deterministic (asynchronous) transducers by computable functions.

Functional Specifications As said before, a specification is in general a relation from inputs to outputs. If this relation is a function, we call it functional. Due to the negative results just mentioned about the synthesis of computable functions from non-functional specifications, we instead here focus on the case of functional specifications and address the following general question: given the specification of a function of data ω-words, is this function "implementable", where we define "implementable" as "being computable by some Turing machine". Moreover, if it is implementable, then we want a procedure to automatically generate an algorithm that computes it. This raises another important question: how to decide whether a specification is functional ? We investigate these questions for asynchronous register transducers, here called register transducers. This asynchrony allows for much more expressive power, but is a source of technical challenge.

Contributions In this paper, we solve the questions mentioned before for the class of (asynchronous) non-deterministic register transducers (NRT). We also give fundamental results on this class. In particular, we prove that:


6. those problems are in PTime for a subclass of NRT, called test-free NRT.

Finally, we also mention that considering the class of deterministic register transducers (DRT for short) instead of computable functions as a yardstick for the notion of being "implementable" for a function would yield undecidability. Indeed, given a function defined by some NRT, it is in general undecidable to check whether this function is realisable by some DRT, by a simple reduction from the universality problem of non-deterministic register automata [19].

Related Work The notion of continuity with regards to Cantor distance is not new, and for rational functions over finite alphabets, it was already known to be decidable [21]. Its connection with computability for functions of ω-words over a finite alphabet has recently been investigated in [3] for one-way and two-way transducers. Our results lift some of theirs to the setting of data words. The model of test-free NRT can be seen as a one-way non-deterministic version of a model of two-way transducers considered in [5].

## **2 Data Words and Register Transducers**

For a (possibly infinite) set S, we denote by S<sup>∗</sup> (resp. Sω) the set of finite (resp. infinite) words over this alphabet, and we let S<sup>∞</sup> = S<sup>∗</sup> ∪ Sω. For a word <sup>u</sup> <sup>=</sup> <sup>u</sup><sup>1</sup> ...un, we denote u <sup>=</sup> <sup>n</sup> its length, and, by convention, for u ∈ Sω, u = ∞. The empty word is denoted ε. For 1 ≤ i ≤ j ≤ u, we let <sup>u</sup>[i:j] = <sup>u</sup>iui+1 ...uj and <sup>u</sup>[i] = <sup>u</sup>[i:i] the <sup>i</sup>th letter of <sup>u</sup>. For u, v <sup>∈</sup> <sup>S</sup>∞, we say that u is a prefix of v, written u v, if there exists w ∈ S<sup>∞</sup> such that v = uw. In this case, we define <sup>u</sup>−<sup>1</sup><sup>v</sup> <sup>=</sup> <sup>w</sup>. For u, v <sup>∈</sup> <sup>S</sup>∞, we say that <sup>u</sup> and <sup>v</sup> mismatch, written mismatch(u, v), when there exists a position i such that 1 ≤ i ≤ u, 1 ≤ i ≤ v and u[i] = v[i]. Finally, for u, v ∈ S∞, we denote by u ∧ v their longest common prefix, i.e. the longest word w ∈ S<sup>∞</sup> such that w u and w v.

Data Words In this paper, Σ and Γ are two finite alphabets and D is a countably infinite set of data. We use letter σ (resp. γ, d) to denote elements of Σ (resp. Γ, D). We also distinguish an arbitrary data value d<sup>0</sup> ∈ D. Given a set R, let τ <sup>R</sup> <sup>0</sup> be the constant function defined by τ <sup>R</sup> <sup>0</sup> (r) = d<sup>0</sup> for all r ∈ R. Given a finite alphabet A, a labelled data is a pair x = (a, d) ∈ A × D, where a is the label and d the data. We define the projections lab(x) = a and dt(x) = d. A data word over A and D is an infinite sequence of labelled data, i.e. a word w ∈ (A × D)ω. We extend the projections lab and dt to data words naturally, i.e. lab(w) ∈ A<sup>ω</sup> and dt(w) ∈ Dω. A data word language is a subset L ⊆ (A × D)ω. Note that here, data words are infinite, otherwise they are called finite data words.

### **2.1 Register Transducers**

Register transducers are transducers recognising data word relations. They are an extension of finite transducers to data word relations, in the same way register

automata [15] are an extension of finite automata to data word languages. Here, we define them over infinite data words with a B¨uchi acceptance condition, and allow multiple registers to contain the same data, with a syntax close to [18]. The current data can be compared for equality with the register contents via tests, which are symbolic and defined via Boolean formulas of the following form. Given R a set of registers, a test is a formula φ satisfying the following syntax:

$$\phi \implies \top \mid \perp \mid r^{=} \mid r^{\#} \mid \phi \land \phi \mid \phi \lor \phi \mid \neg \phi$$

where r ∈ R. Given a valuation τ : R → D, a test φ and a data d, we denote by τ, d <sup>|</sup><sup>=</sup> <sup>φ</sup> the satisfiability of <sup>φ</sup> by <sup>d</sup> in valuation <sup>τ</sup> , defined as τ, d <sup>|</sup><sup>=</sup> <sup>r</sup><sup>=</sup> if <sup>τ</sup> (r) = <sup>d</sup> and τ, d <sup>|</sup><sup>=</sup> <sup>r</sup><sup>=</sup> if <sup>τ</sup> (r) <sup>=</sup> <sup>d</sup>. The Boolean combinators behave as usual. We denote by TstR the set of (symbolic) tests over <sup>R</sup>.

**Definition 1.** A non-deterministic register transducer ( NRT) is a tuple T = (Q, R, i0, F, Δ), where Q is a finite set of states, i<sup>0</sup> ∈ Q is the initial state, F ⊆ Q is the set of accepting states, R is a finite set of registers and Δ ⊆ <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> TstR <sup>×</sup> <sup>2</sup><sup>R</sup> <sup>×</sup> (<sup>Γ</sup> <sup>×</sup> <sup>R</sup>)<sup>∗</sup> <sup>×</sup> <sup>Q</sup> is a finite set of transitions. We write q σ,φ|asgn,o −−−−−−→ T <sup>q</sup> for (q, σ, φ, asgn, o, q- ) ∈ Δ (T is sometimes omitted).

The semantics of a register transducer is given by a labelled transition system: we define <sup>L</sup>T = (C, Λ, <sup>→</sup>), where <sup>C</sup> <sup>=</sup> <sup>Q</sup> <sup>×</sup> (<sup>R</sup> → D) is the set of configurations, Λ = (Σ ×D)×(Γ ×D)<sup>∗</sup> is the set of labels, and we have, for all (q, τ ),(q- , τ - ) ∈ C and for all (l, w) <sup>∈</sup> <sup>Λ</sup>, that (q, τ ) (l,w) −−−→ (q- , τ - ) whenever there exists a transition

q σ,φ|asgn,o −−−−−−→ T <sup>q</sup> such that, by writing l = (σ- , d) and w = (γ- 1, d1)...(γ- n, dn):


Then, a run of T is an infinite sequence of configurations and transitions <sup>ρ</sup> = (q0, τ0) (u1,v1) −−−−→ LT (q1, τ1) (u2,v2) −−−−→ LT ··· . Its input is in(ρ) = u1u<sup>2</sup> ... , its output is out(ρ) = v<sup>1</sup> · v<sup>2</sup> ... . We also define its sequence of states st(ρ) = q0q<sup>1</sup> ... , and its trace tr(ρ) = u1·v1·u2·v<sup>2</sup> ... . Such run is initial if (q0, τ0)=(i0, τ <sup>R</sup> <sup>0</sup> ). It is final if it satisfies the B¨uchi condition, i.e. inf(st) <sup>∩</sup> <sup>F</sup> <sup>=</sup> <sup>∅</sup>, where inf(st) = {<sup>q</sup> <sup>∈</sup> <sup>Q</sup> <sup>|</sup> <sup>q</sup> <sup>=</sup> <sup>q</sup>i for infinitely many i}. Finally, it is accepting if it is both initial and final. We then write (q0, τ0) <sup>u</sup>|<sup>v</sup> −−→ T to express that there is a final run <sup>ρ</sup> of <sup>T</sup> starting from (q0, τ0) such that in(ρ) = u and out(ρ) = v. In the whole paper, and unless stated otherwise, we always assume that the output of an accepting run is infinite (v ∈ (Γ × D)ω), which can be ensured by a B¨uchi condition.

A partial run is a finite prefix of a run. The notions of input, output and states are extended by taking the corresponding prefixes. We then write (q0, τ0) <sup>u</sup>|<sup>v</sup> −−→

T

(qn, τn) to express that there is a partial run <sup>ρ</sup> of <sup>T</sup> starting from configuration (q0, τ0) and ending in configuration (qn, τn) such that in(ρ) = <sup>u</sup> and out(ρ) = <sup>v</sup>.

Finally, the relation represented by a transducer T is:

$$\begin{aligned} \left[T\right] = \left\{ (u, v) \in (\Sigma \times \mathcal{D})^\omega \times (\Gamma \times \mathcal{D})^\omega \mid \text{there exists an accepting run } \rho \text{ of } T \\ \text{such that } \text{in}(\rho) = u \text{ and } \text{out}(\rho) = v \right\} \end{aligned}$$

Example 2. As an example, consider the register transducer Trename depicted in Figure 1. It realises the following transformation: consider a setting in which we deal with logs of communications between a set of clients. Such a log is an infinite sequence of pairs consisting of a tag, chosen in some finite alphabet Σ, and the identifier of the client delivering this tag, chosen in some infinite set of data values. The transformation should modify the log as follows: for a given client that needs to be modified, each of its messages should now be associated with some new identifier. The transformation has to verify that this new identifier is indeed free, i.e. never used in the log. Before treating the log, the transformation receives as input the id of the client that needs to be modified (associated with the tag del), and then a sequence of identifiers (associated with the tag ch), ending with #. The transducer is non-deterministic as it has to guess which of these identifiers it can choose to replace the one of the client. In particular, observe that it may associate multiple output words to a same input if two such free identifiers exist.

**Fig. 1.** A register transducer Trename. It has three registers r1, r<sup>2</sup> and r<sup>0</sup> and four states. σ denotes any letter in Σ, r<sup>1</sup> stores the id of del and r<sup>2</sup> the chosen id of ch, while r<sup>0</sup> is used to output the last data value read as input. As we only assign data to single registers, we write <sup>r</sup><sup>i</sup> for the singleton assignment set {ri}.

Finite Transducers Since we reduce the decision of continuity and functionality of NRT to the one of finite transducers, let us introduce them: a finite transducer (NFT for short) is an NRT with 0 registers (i.e. R = ∅). Thus, its transition relation can be represented as Δ ⊆ Q × Σ × Γ<sup>∗</sup> × Q. A direct extension of the construction of [15, Proposition 1] allows to show that:

**Proposition 3.** Let <sup>T</sup> be an NRT with <sup>k</sup> registers, and let <sup>X</sup> <sup>⊂</sup>f <sup>D</sup> be a finite subset of data. Then, -<sup>T</sup> <sup>∩</sup> (<sup>Σ</sup> <sup>×</sup> <sup>X</sup>)<sup>ω</sup> <sup>×</sup> (<sup>Γ</sup> <sup>×</sup> <sup>X</sup>)<sup>ω</sup> is recognised by an NFT of exponential size, more precisely with O(|Q|×|X| <sup>|</sup>R<sup>|</sup> ) states.

#### **2.2 Technical Properties of Register Automata**

Although automata are simpler machines than transducers, we only use them as tools in our proofs, which is why we define them from transducers, and not the other way around. A non-deterministic register automaton, denoted NRA, is a transducer without outputs: its transition relation is <sup>Δ</sup> <sup>⊆</sup> <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> TstR <sup>×</sup> <sup>2</sup><sup>R</sup> <sup>×</sup> {ε} × <sup>Q</sup> (simply represented as <sup>Δ</sup> <sup>⊆</sup> <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> TstR <sup>×</sup> <sup>2</sup><sup>R</sup> <sup>×</sup> <sup>Q</sup>). The semantics are the same, except that now we lift the condition that the output v is infinite since there is no output. For A an NRA, we denote L(A) = {u ∈ (Σ × D)<sup>ω</sup> | there exists an accepting run ρ of A over u}. Necessarily the output of an accepting run is ε. In this section, we establish technical properties about NRA.

Proposition 4, the so-called "indistinguishability property", was shown in the seminal paper by Kaminski and Francez [15, Proposition 1]. Their model differs in that they do not allow distinct registers to contain the same data, and in the corresponding test syntax, but their result easily carries to our setting. It states that if an NRA accepts a data word, then such data word can be relabelled with data from any set containing d<sup>0</sup> and with at least k + 1 elements. Indeed, at any point of time, the automaton can only store at most k data in its registers, so its notion of "freshness" is a local one, and forgotten data can thus be reused as fresh ones. Moreover, as the automaton only tests data for equality, their actual value does not matter, except for d<sup>0</sup> which is initially contained in the registers.

Such "small-witness" property is fundamental to NRA, and will be paramount in establishing decidability of functionality (Section 3) and computability (Section 4). We use it jointly with Lemma 5, which states that the interleaving of the traces of runs of an NRT can be recognised with an NRA, and Lemma 6, which expresses that an NRA can check whether interleaved words coincide on some bounded prefix, and/or mismatch before some given position.

**Proposition 4 ([15]).** Let <sup>A</sup> be an NRA with <sup>k</sup> registers. If <sup>L</sup>(A) <sup>=</sup> <sup>∅</sup>, then, for any <sup>X</sup> ⊆ D of size <sup>|</sup>X| ≥ <sup>k</sup> + 1 such that <sup>d</sup><sup>0</sup> <sup>∈</sup> <sup>X</sup>, <sup>L</sup>(A) <sup>∩</sup> (<sup>Σ</sup> <sup>×</sup> <sup>X</sup>)<sup>ω</sup> <sup>=</sup> <sup>∅</sup>.

The runs of a register transducer T can be flattened to their traces, so as to be recognised by an NRA. Those traces can then be interleaved, in order to be compared. The proofs of the following properties are straightforward.

Let <sup>ρ</sup><sup>1</sup> = (q0, τ0) (u1,u- <sup>1</sup>) −−−−−→ LT (q1, τ1)... and <sup>ρ</sup><sup>2</sup> = (p0, μ0) (v1,v- <sup>1</sup>) −−−−→ LT (p1, μ1)... be two runs of a transducer T. Then, we define their interleaving ρ1⊗ρ<sup>2</sup> = u<sup>1</sup> ·u- <sup>1</sup> ·v<sup>1</sup> · v- <sup>1</sup> · u<sup>2</sup> · u- <sup>2</sup> · v<sup>2</sup> · v- <sup>2</sup> ... and L⊗(T) = {ρ<sup>1</sup> ⊗ ρ<sup>2</sup> | ρ<sup>1</sup> and ρ<sup>2</sup> are accepting runs of T}.

**Lemma 5.** If T has k registers, then L⊗(T) is recognised by an NRA with 2k registers.

**Lemma 6.** Let i, j <sup>∈</sup> <sup>N</sup> ∪ {∞}. We define <sup>M</sup><sup>i</sup> j <sup>=</sup> {u1u- 1v1v- <sup>1</sup> ···| ∀<sup>k</sup> <sup>≥</sup> <sup>1</sup>, uk, v<sup>k</sup> <sup>∈</sup> (Σ × D), u- k, v- k <sup>∈</sup> (<sup>Γ</sup> × D)∗, <sup>∀</sup><sup>1</sup> <sup>≤</sup> <sup>k</sup> <sup>≤</sup> j, v<sup>k</sup> <sup>=</sup> <sup>u</sup><sup>k</sup> and u- <sup>1</sup> · u- <sup>2</sup> ···∧v- <sup>1</sup> · v- <sup>2</sup> ... ≤ i}. Then, M<sup>i</sup> j is recognisable by an NRA with <sup>2</sup> registers and with <sup>1</sup> register if <sup>i</sup> <sup>=</sup> <sup>∞</sup>.

## **3 Functionality, Equivalence and Composition of NRT**

In general, since they are non-deterministic, NRT may not define functions but relations, as illustrated by Example 2. In this section, we first show that deciding

whether a given NRT defines a function is PSpace-complete, in which case we call it functional. We show, as a consequence, that testing whether two functional NRT define two functions which coincide on their common domain is PSpace-complete. Finally, we show that functions defined by NRT are closed under composition. This is an appealing property in transducer theory, as it allows to define complex functions by composing simple ones.

Example 7. As explained before, the transducer Trename described in Example 2 is not functional. To gain functionality, one can reinforce the specification by considering that one gets at the beginning a list of k possible identifiers, and that one has to select the first one which is free, for some fixed k. This transformation is realised by the register transducer Trename2 depicted in Figure 2 (for k = 2).

**Fig. 2.** A NRT Trename2, with four registers r1, r2, r<sup>3</sup> and r<sup>0</sup> (the latter being used, as in Figure 1, to output the last read data). After reading the # symbol, it guesses whether the value of register r<sup>2</sup> appears in the suffix of the input word. If not, it goes to state 5, and replaces occurrences of r<sup>1</sup> by r2. Otherwise, it moves to state 6, waiting for an occurrence of r2, and replaces occurrences of r<sup>1</sup> by r3.

Let us start with the functionality problem in the data-free case. It is already known that checking whether an NFT over ω-words is functional is decidable [13,11]. By relying on the pattern logic of [10] designed for transducers of finite words, it can be shown that it is decidable in NLogSpace.

**Proposition 8.** Deciding whether an NFT is functional is in NLogSpace.

The following theorem shows that a relation between data-words defined by an NRT with k registers is a function iff its restriction to a set of data with at most 2k + 3 data is a function. As a consequence, functionality is decidable as it reduces to the functionality problem of transducers over a finite alphabet.

**Theorem 9.** Let T be an NRT with k registers. Then, for all X ⊆ D of size <sup>|</sup>X| ≥ <sup>2</sup><sup>k</sup> + 3 such that <sup>d</sup><sup>0</sup> <sup>∈</sup> <sup>X</sup>, we have that <sup>T</sup> is functional if and only if -<sup>T</sup> <sup>∩</sup> ((<sup>Σ</sup> <sup>×</sup> <sup>X</sup>)<sup>ω</sup> <sup>×</sup> (<sup>Γ</sup> <sup>×</sup> <sup>X</sup>)ω) is functional.

Proof. The left-to-right direction is trivial. Now, assume T is not functional. Let x ∈ (Σ × D)<sup>ω</sup> be such that there exists y, z ∈ (Γ × D)<sup>ω</sup> such that y = z and (x, y),(x, z) <sup>∈</sup> -<sup>T</sup>. Let <sup>i</sup> <sup>=</sup> <sup>y</sup> <sup>∧</sup>z. Then, consider the language <sup>L</sup> <sup>=</sup> {ρ<sup>1</sup> <sup>⊗</sup>ρ<sup>2</sup> <sup>|</sup> <sup>ρ</sup><sup>1</sup> and ρ<sup>2</sup> are accepting runs of T, in(ρ1) = in(ρ2) and out(ρ1)∧out(ρ2) ≤ i}. Since, by Lemma 5, L⊗(T) is recognised by an NRA with 2k registers and, by Lemma 6, Mi <sup>∞</sup> is recognised by an NRA with 2 registers, we get that <sup>L</sup> = <sup>L</sup>⊗(T) <sup>∩</sup> <sup>M</sup><sup>i</sup> <sup>∞</sup> is recognised by an NRA with 2k + 2 registers.

Now, <sup>L</sup> <sup>=</sup> <sup>∅</sup>, since, by letting <sup>ρ</sup><sup>1</sup> and <sup>ρ</sup><sup>2</sup> be the runs of <sup>T</sup> both with input <sup>x</sup> and with respective outputs y and z, we have that w = ρ<sup>1</sup> ⊗ ρ<sup>2</sup> ∈ L. Let X ⊆ D such that <sup>|</sup>X| ≥ <sup>2</sup><sup>k</sup> + 3 and <sup>d</sup><sup>0</sup> <sup>∈</sup> <sup>X</sup>. By Proposition 4, we get that <sup>L</sup> <sup>∩</sup> (<sup>Σ</sup> <sup>×</sup> <sup>X</sup>)<sup>ω</sup> <sup>=</sup> <sup>∅</sup>. By letting w- = ρ- <sup>1</sup> ⊗ ρ- <sup>2</sup> ∈ L ∩ (Σ × X)ω, and x- = in(ρ- <sup>1</sup>) = in(ρ- <sup>2</sup>), y- = out(ρ- 1) and z- = out(ρ- 2), we have that (x- , y- ),(x- , z- ) <sup>∈</sup> -<sup>T</sup> <sup>∩</sup> ((<sup>Σ</sup> <sup>×</sup> <sup>X</sup>)<sup>ω</sup> <sup>×</sup> (<sup>Γ</sup> <sup>×</sup> <sup>X</sup>)ω) and y- ∧ z- ≤ i, so, in particular, y- = z- (since both are infinite words). Thus, -<sup>T</sup> <sup>∩</sup> ((<sup>Σ</sup> <sup>×</sup> <sup>X</sup>)<sup>ω</sup> <sup>×</sup> (<sup>Γ</sup> <sup>×</sup> <sup>X</sup>)ω) is not functional.

As a consequence of Proposition 8 and Theorem 9, we obtain the following result. The lower bound is obtained by encoding non-emptiness of register automata, which is PSpace-complete [4].

**Corollary 10.** Deciding whether an NRT T is functional is PSpace-complete.

Hence, the following problem on the equivalence of NRT is decidable:

**Theorem 11.** The problem of deciding, given two functions f,g defined by NRT, whether for all <sup>x</sup> <sup>∈</sup> dom(f) <sup>∩</sup> dom(g), <sup>f</sup>(x) = <sup>g</sup>(x), is PSpace-complete.

Proof. The formula ∀x ∈ dom(f) ∩ dom(g) · f(x) = g(x) is true iff the relation f ∪ g = {(x, y) | y = f(x) ∨ y = g(x)} is a function. The latter can be decided by testing whether the disjoint union of the transducers defining f and g defines a function, which is in PSpace by Corollary 10. To show the hardness, we similarly reduce the emptiness problem of NRA A over finite words, just as in the proof of Corollary 10. In particular, the functions f<sup>1</sup> and f<sup>2</sup> defined in this proof (which have the same domain) are equal iff <sup>L</sup>(A) = <sup>∅</sup>.

Note that under the promise that f and g have the same domain, the latter theorem implies that it is decidable to check whether the two functions are equal. However, checking dom(f) = dom(g) is undecidable, as the languageequivalence problem for non-deterministic register automata is undecidable, since, in particular, universality is undecidable [19].

Closure under composition is a desirable property for transducers, which holds in the data-free setting [1]. We show that it also holds for functional NRT.

**Theorem 12.** Let f,g be two functions defined by NRT. Then, their composition f ◦ g is (effectively) definable by some NRT.

Proof (Sketch). By f ◦ g we mean f ◦ g : x → f(g(x)). Assume f and g are defined by <sup>T</sup>f = (Qf , Rf , q0, Ff , Δf ) and <sup>T</sup>g = (Qg, Rg, p0, Fg, Δg) respectively. Wlog we assume that the input and output finite alphabets of <sup>T</sup>f and <sup>T</sup>g are all equal to <sup>Σ</sup>, and that <sup>R</sup>f and <sup>R</sup>g are disjoint. We construct <sup>T</sup> such that -<sup>T</sup> <sup>=</sup> <sup>f</sup> ◦ <sup>g</sup>. The proof is similar to the data-free case where the composition is shown via a product construction which simulates both transducers in parallel, executing the second on the output of the first. Assume <sup>T</sup>g has some transition

p σ,φ|{r},o −−−−−−→ <sup>q</sup> where <sup>o</sup> <sup>∈</sup> (<sup>Σ</sup> <sup>×</sup> <sup>R</sup>g)∗. Then <sup>T</sup> has to be able to execute transitions of <sup>T</sup>f while processing <sup>o</sup>, even though <sup>o</sup> does not contain any concrete data values (it is here the main important difference with the data-free setting). However, if <sup>T</sup> knows the equality types between <sup>R</sup>f and <sup>R</sup>g, then it is able to trigger the transitions of <sup>T</sup>f . For example, assume that <sup>o</sup> = (a, rg) and assume that the content of <sup>r</sup>g is equal to the content of <sup>r</sup>f , <sup>r</sup>f being a register of <sup>T</sup>f , then if <sup>T</sup>f has some transition of the form p a,r<sup>=</sup> f |{r- f },o- −−−−−−−−→ q then T can trigger the transition (p, q) σ,φ|{r}∪{r- f :=rg},o- −−−−−−−−−−−−−→ (p- , q- ) where the operation r- f := <sup>r</sup><sup>g</sup> is a syntactic sugar on top of NRT that intuitively means "put the content of <sup>r</sup>g into <sup>r</sup>- f ".

Remark 13. The proof of Theorem 12 does not use the hypothesis that f and g are functions, and actually shows a stronger result, namely that relations defined by NRT are closed under composition.

## **4 Computability and Continuity**

We equip the set of (finite or infinite) data words with the usual distance: for u, v ∈ (Σ ×D)ω, d(u, v) = 0 if u = v and d(u, v)=2−u∧v otherwise. A sequence of (finite or infinite) data words (xn)n∈<sup>N</sup> converges to some infinite data word <sup>x</sup> if for all <sup>&</sup>gt; 0, there exists <sup>N</sup> <sup>≥</sup> 0 such that for all <sup>n</sup> <sup>≥</sup> <sup>N</sup>, <sup>d</sup>(xn, x) <sup>≤</sup> .

In order to reason with computability, we assume in the sequel that the infinite set of data values D we are dealing with has an effective representation. For instance, this is the case when <sup>D</sup> <sup>=</sup> <sup>N</sup>.

We now define how a Turing machine can compute a function of data words. We consider deterministic Turing machines, which three tapes: a read-only oneway input tape (containing the infinite input data word), a two-way working tape, and a write-only one-way output tape (on which it writes the infinite output data word). Consider some input data word <sup>x</sup> <sup>∈</sup> (<sup>Σ</sup> × D)ω. For any integer <sup>k</sup> <sup>∈</sup> <sup>N</sup>, we let M(x, k) denote the output written by M on its output tape after having read the k first cells of the input tape. Observe that as the output tape is write-only, the sequence of data words (M(x, k))k≥<sup>0</sup> is non-decreasing.

**Definition 14 (Computability).** A function f : (Σ × D)<sup>ω</sup> → (Γ × D)<sup>ω</sup> is computable if there exists a deterministic multi-tape machine M such that for all <sup>x</sup> <sup>∈</sup> dom(f), the sequence (M(x, k))k≥<sup>0</sup> converges to <sup>f</sup>(x).

**Definition 15 (Continuity).** A function f : (Σ × D)<sup>ω</sup> → (Γ × D)<sup>ω</sup> is continuous at x ∈ dom(f) if (equivalently):


Then, f is continuous if and only if it is continuous at each x ∈ dom(f). Finally, a functional NRT <sup>T</sup> is continuous when -<sup>T</sup> is continuous.

Example 16. We give an example of a non-continuous function f. The finite input and output alphabets are unary, and are therefore ignored in the description of f. Such function associates with every sequence s = d1d<sup>2</sup> ···∈D<sup>ω</sup> the word f(s) = d<sup>ω</sup> <sup>1</sup> if d<sup>1</sup> occurs infinitely many times in s, otherwise f(s) = s itself.

The function f is not continuous. Indeed, by taking d = d- , the sequence of data words d(d- )nd<sup>ω</sup> converges to d(d- )ω, while f(d(d- )ndω) = d<sup>ω</sup> converges to d<sup>ω</sup> = f(d(d- )ω) = d(d- )ω.

Moreover, f is realisable by some NRT which non-deterministically guesses whether d<sup>1</sup> repeats infinitely many times or not. It needs only one register r in which to store d1. In the first case, it checks whether the current data d is equal the content r infinitely often, and in the second case, it checks that this test succeeds finitely many times, using B¨uchi conditions.

One can show that the register transducer Trename2 considered in Example 7 also realises a function which is not continuous, as the value stored in register r<sup>2</sup> may appear arbitrarily far in the input word. One could modify the specification to obtain a continuous function as follows. Instead of considering an infinite log, one considers now an infinite sequence of finite logs, separated by \$ symbols. The register transducer Trename3, depicted in Figure 3, defines such a function.

**Fig. 3.** A register transducer Trename3. This transducer is non-deterministic, yet it defines a continuous function.

We now prove the equivalence between continuity and computability for functions defined by NRT. One direction, namely the fact that computability implies continuity, is easy, almost by definition. For the other direction, we rely on the following lemma which states that it is decidable whether a word v can be safely output, only knowing a prefix u of the input. In particular, given a function f, we let ˆf be the function defined over all finite prefixes u of words in dom(f) by <sup>ˆ</sup>f(u) = (f(uy) <sup>|</sup> uy <sup>∈</sup> dom(f)), the longest common prefix of all outputs of continuations of u by f. Then, we have the following decidability result:

**Lemma 17.** The following problem is decidable. Given an NRT T defining a function f, two finite data words u ∈ (Σ × D)<sup>∗</sup> and v ∈ (Γ × D)∗, decide whether <sup>v</sup> <sup>ˆ</sup>f(u).

**Theorem 18.** Let f be a function defined by some NRT T. Then f is continuous iff f is computable.

Proof. ⇐ Assuming <sup>f</sup> <sup>=</sup> -<sup>T</sup> is computable by some Turing machine <sup>M</sup>, we show that f is continuous. Indeed, consider some x ∈ dom(f), and some i ≥ 0. As the sequence of finite words (M(x, k))k∈<sup>N</sup> converges to <sup>f</sup>(x) and these words have non-decreasing lengths, there exists j ≥ 0 such that |M(x, j)| ≥ i. Hence, for any data word y ∈ dom(f) such that |x ∧ y| ≥ j, the behaviour of M on y is the same during the first j steps, as M is deterministic, and thus |f(x) ∧ f(y)| ≥ i, showing that f is continuous at x.

⇒ Assume that f is continuous. We describe a Turing machine computing f; the corresponding algorithm is formalised as Algorithm 1. When reading a finite prefix <sup>x</sup>[:j] of its input <sup>x</sup> <sup>∈</sup> dom(f), it computes the set <sup>P</sup>j of all configurations (q, τ ) reached by T on x[:j]. This set is updated along taking increasing values of <sup>j</sup>. It also keeps in memory the finite output word <sup>o</sup>j that has been output so far. For any j, if dt(x[:j]) denotes the data that appear in x, the algorithm then decides, for each input (σ, d) ∈ Σ × (dt(x[:j]) ∪ {d0}) whether (σ, d) can safely be output, i.e., whether all accepting runs on words of the form x[:j]y, for an infinite word <sup>y</sup>, outputs at least <sup>o</sup>j (σ, d). The latter can be decided, given <sup>T</sup>, <sup>o</sup>j and x[:j], by Lemma 17. Note that it suffices to look at data in dt(x[:j]) ∪ {d0} only since, by definition of NRT, any data that is output is necessarily stored in some register, and therefore appears in x[:j] or is equal to d0. Let us show that

**Algorithm 1:** Algorithm describing the machine <sup>M</sup>f computing <sup>f</sup>.

**Data:** <sup>x</sup> <sup>∈</sup> dom(f) **<sup>1</sup>** o := ; **<sup>2</sup> for** <sup>j</sup> = 0 **to** <sup>∞</sup> **do <sup>3</sup> for** (σ, d) <sup>∈</sup> <sup>Σ</sup> <sup>×</sup> (dt(x[:j]) ∪ {d0}) **do <sup>4</sup> if** o.(σ, d) <sup>ˆ</sup>f(x[:j]) **then** // such test is decidable by Lemma 17 **<sup>5</sup>** o := o.(σ, d); **<sup>6</sup>** output (σ, d); **7 end 8 end 9 end**

<sup>M</sup>f actually computes <sup>f</sup>. Let <sup>x</sup> <sup>∈</sup> dom(f). We have to show that the sequence (Mf (x, j))j converges to <sup>f</sup>(x). Let <sup>o</sup>j be the content of variable <sup>o</sup> of <sup>M</sup>f when exiting the inner loop at line 8, when the outer loop (line 2) has been executed <sup>j</sup> times (hence <sup>j</sup> input symbols have been read). Note that <sup>o</sup>j <sup>=</sup> <sup>M</sup>f (x, j). We have <sup>o</sup><sup>1</sup> <sup>o</sup><sup>2</sup> ... and <sup>o</sup>j <sup>ˆ</sup>f(x[:j]) for all <sup>j</sup> <sup>≥</sup> 0. Hence, <sup>o</sup>j <sup>f</sup>(x) for all <sup>j</sup> <sup>≥</sup> 0. To show that (oj )j converges to <sup>f</sup>(x), it remains to show that (oj )j is non-stabilising, i.e. <sup>o</sup>i<sup>1</sup> <sup>≺</sup> <sup>o</sup>i<sup>2</sup> <sup>≺</sup> ... for some infinite subsequence <sup>i</sup><sup>1</sup> < i<sup>2</sup> <... . First, note that <sup>f</sup> being continuous is equivalent to the sequence ( <sup>ˆ</sup>f(x[:k]))k converging to <sup>f</sup>(x). Therefore we have that <sup>f</sup>(x)<sup>∧</sup> <sup>ˆ</sup>f(x[:k]) can be arbitrarily long, for sufficiently large <sup>k</sup>. Let <sup>j</sup> <sup>≥</sup> 0 and (σ, d) = <sup>f</sup>(x)[|oj <sup>|</sup>+1]. By the latter property and the fact that <sup>o</sup>j .(σ, d) <sup>f</sup>(x), necessarily, there exists some k>j such that <sup>o</sup>j .(σ, d) <sup>ˆ</sup>f(x[:k]). Moreover, by definition of NRT, <sup>d</sup> is necessarily a data that appears in some prefix of x, therefore there exists k- ≥ k such that d appears in x[:k- ] and <sup>o</sup>j .(σ, d) <sup>ˆ</sup>f(x[:k] <sup>ˆ</sup>f(x[:k- ]. This entails that <sup>o</sup>j .(σ, d) <sup>o</sup>k- . So, we have shown that for all for all j, there exists k- > j such that <sup>o</sup>j <sup>≺</sup> <sup>o</sup>k- , which concludes the proof.

Now that we have shown that computability is equivalent with continuity for functions defined by NRT, we exhibit a pattern which allows to decide continuity. Such pattern generalises the one of [3] to the setting of data words, the difficulty lying in showing that our pattern can be restricted to a finite number of data.

**Theorem 19.** Let T be a functional NRT with k registers. Then, for all X ⊆ D such that <sup>|</sup>X| ≥ <sup>2</sup><sup>k</sup> + 3 and <sup>d</sup><sup>0</sup> <sup>∈</sup> <sup>X</sup>, <sup>T</sup> is not continuous at some <sup>x</sup> <sup>∈</sup> (<sup>Σ</sup> × D)<sup>ω</sup> if and only if T is not continuous at some z ∈ (Σ × X)ω.

Proof. The right-to-left direction is trivial. Now, let T be a functional NRT with <sup>k</sup> registers which is not continuous at some <sup>x</sup> <sup>∈</sup> (<sup>Σ</sup> × D)ω. Let <sup>f</sup> : dom(-<sup>T</sup>) <sup>→</sup> (<sup>Γ</sup> × D)<sup>ω</sup> be the function defined by <sup>T</sup>, as: for all <sup>u</sup> <sup>∈</sup> dom(-<sup>T</sup>), f(u) = <sup>v</sup> where <sup>v</sup> <sup>∈</sup> (<sup>Γ</sup> × D)<sup>ω</sup> is the unique data word such that (u, v) <sup>∈</sup> -T.

Now, let X ⊆ D be such that |X| ≥ 2k + 3 and d<sup>0</sup> ∈ X. We need to build two words u and v labelled over X which coincide on a sufficiently long prefix to allow for pumping, hence yielding a converging sequence of input data words whose images do not converge, witnessing non-continuity. To that end, we use a similar proof technique as for Theorem 9: we show that the language of interleaved runs whose inputs coincide on a sufficiently long prefix while their respective outputs mismatch before a given position is recognisable by an NRA, allowing us to use the indistinguishability property. We also ask that one run presents sufficiently many occurrences of a final state <sup>q</sup>f , so that we can ensure that there exists a pair of configurations containing <sup>q</sup>f which repeats in both runs.

On reading such u and v, the automaton behaves as a finite automaton, since the number of data is finite ([15, Proposition 1]). By analysing the respective runs, we can, using pumping arguments, bound the position on which the mismatch appears in u, then show the existence of a synchronised loop over u and v after such position, allowing us to build the sought witness for non-continuity.

Relabel over X Thus, assume T is not continuous at some point x ∈ (Σ × D)ω. Let <sup>ρ</sup> be an accepting run of <sup>T</sup> over <sup>x</sup>, and let <sup>q</sup>f <sup>∈</sup> inf(st(ρ))∩<sup>F</sup> be an accepting state repeating infinitely often in ρ. Then, let i ≥ 0 be such that for all j ≥ 0, there exists y ∈ dom(f) such that x∧y ≥ j but f(x)∧f(y) ≤ i. Now, define <sup>K</sup> <sup>=</sup> <sup>|</sup>Q| × (2<sup>k</sup> + 3)<sup>2</sup><sup>k</sup> and let <sup>m</sup> = (2<sup>i</sup> + 3) <sup>×</sup> (<sup>K</sup> + 1). Finally, pick <sup>j</sup> such that <sup>ρ</sup>[1:j] contains at least <sup>m</sup> occurrences of <sup>q</sup>f . Consider the language:

$$L = \left\{ \rho\_1 \otimes \rho\_2 \, \middle| \, \left| \| \mathbf{\dot{n}}(\rho\_1) \wedge \mathbf{\dot{n}}(\rho\_2) \| \right| \ge j, \left| \mathbf{out}(\rho\_1) \wedge \mathbf{out}(\rho\_2) \right| \le i \text{ and} \right\}$$
 
$$\text{there are at least } m \text{ occurrences of } q\_f \text{ in } \rho\_1 [1:j] \text{)}$$

By Lemma 5, L⊗(T) is recognised by an NRA with 2k registers. Additionnally, by Lemma 6, M<sup>i</sup> j is recognised by an NRA with 2 registers. Thus, <sup>L</sup> <sup>=</sup> <sup>L</sup>⊗(T)∩Oq<sup>f</sup> m,j∩ Mi j , where <sup>O</sup>q<sup>f</sup> m,j checks there are at least <sup>m</sup> occurrences of <sup>q</sup><sup>f</sup> in <sup>ρ</sup>1[1:j] (this is easily doable from the automaton recognising L⊗(T) by adding an m-bounded counter), is recognisable by an NRA with 2k + 2 registers.

Choose y ∈ dom(f) such that x ∧ y ≥ j but f(x) ∧ f(y) ≤ i. By letting ρ<sup>1</sup> (resp. ρ2) be an accepting run of T over x (resp. y) we have ρ<sup>1</sup> ⊗ ρ<sup>2</sup> ∈ L, so <sup>L</sup> <sup>=</sup> <sup>∅</sup>. By Proposition 4, <sup>L</sup> <sup>∩</sup> ((<sup>Σ</sup> <sup>×</sup> <sup>X</sup>)<sup>ω</sup> <sup>×</sup> (<sup>Γ</sup> <sup>×</sup> <sup>X</sup>)ω) <sup>=</sup> <sup>∅</sup>. Let <sup>w</sup> <sup>=</sup> <sup>ρ</sup>- <sup>1</sup> ⊗ ρ- 2 ∈ L ∩ ((Σ × X)<sup>ω</sup> × (Γ × X)ω), u = in(ρ- <sup>1</sup>) and v = in(ρ- <sup>2</sup>). Then, u ∧ v ≥ j, f(u) <sup>∧</sup> <sup>f</sup>(v) ≤ <sup>i</sup> and there are at least <sup>m</sup> occurrences of <sup>q</sup>f in <sup>ρ</sup>1[1:j].

Now, we depict ρ- <sup>1</sup> and ρ- <sup>2</sup> in Figure 4, where we decompose u as u = <sup>u</sup><sup>1</sup> ...um·<sup>s</sup> and <sup>v</sup> as <sup>v</sup> <sup>=</sup> <sup>u</sup><sup>1</sup> ...um·t; their corresponding images being respectively u- = u- <sup>1</sup> ...u- m · <sup>s</sup> and u-- = u-- <sup>1</sup> ...u-- mt --. We also let l = (i + 1)(K + 1) and l - = 2(i + 1)(K + 1). Since the data of u, v and w belong to X, we know that <sup>τ</sup>i, μi : <sup>R</sup> <sup>→</sup> <sup>X</sup>.

**Fig. 4.** Runs of <sup>f</sup> over <sup>u</sup> <sup>=</sup> <sup>u</sup><sup>1</sup> ...u<sup>m</sup> · <sup>s</sup> and <sup>v</sup> <sup>=</sup> <sup>u</sup><sup>1</sup> ...u<sup>m</sup> · <sup>t</sup>.

Repeating configurations First, let us observe that in a partial run of ρ- <sup>1</sup> containing more than |Q|×|X| <sup>k</sup> occurrences of <sup>q</sup>f , there is at least one productive transition, i.e. a transition whose output is o = ε. Otherwise, by the pigeonhole principle, there exists a configuration <sup>μ</sup> : <sup>R</sup> <sup>→</sup> <sup>X</sup> such that (qf , μ) occurs at least twice in the partial run. Since all transitions are improductive, it would mean that, by writing <sup>w</sup> the corresponding part of input, we have (qf , μ) <sup>w</sup>|<sup>ε</sup> −−→ T (q<sup>f</sup> , μ). This partial run is part of ρ- <sup>1</sup>, so, in particular, (qf , μ) is accessible, hence by taking <sup>w</sup><sup>0</sup> such that (i0, τ0) <sup>w</sup>0|w- <sup>0</sup> −−−−→ T (q<sup>f</sup> , μ), we have that <sup>f</sup>(w0wω) = <sup>w</sup>- <sup>0</sup>, which is a finite word, contradicting our assumption that all accepting runs produce an infinite output. This implies that, for any n ≥ |Q|×|X| <sup>k</sup> (in particular for n = l), u- <sup>1</sup> ...u- n ≥ <sup>i</sup> + 1.

Locate the mismatch Again, upon reading <sup>u</sup>l+1 ...ul- , there are (i + 1)(K + 1) occurrences of <sup>q</sup>f . There are two cases:

(a) There are at least i + 1 productive transitions in ρ- <sup>2</sup>. Then, we obtain that u-- <sup>1</sup> ...u-- l- > i, so mismatch(u- <sup>1</sup> ...u- l- , u-- <sup>1</sup> ...u-- l- ), since we know f(u) ∧ f(v) ≤ i and they are respectively prefixes of f(u) and f(v), both of length at

least <sup>i</sup>+1. Afterwards, upon reading <sup>u</sup>l-+1 ...um, there are <sup>K</sup>+1 <sup>&</sup>gt; <sup>|</sup>Q|×|X<sup>|</sup> 2k occurrences of <sup>q</sup>f , so, by the pigeonhole principle, there is a repeating pair: there exist indices p and p such that l - ≤ p<p- <sup>≤</sup> <sup>m</sup> and (qf , μp)=(qf , μp- ), (qp, τp)=(qp- , τp- ). Thus, let <sup>z</sup>P <sup>=</sup> <sup>u</sup><sup>1</sup> ...up, <sup>z</sup>R <sup>=</sup> <sup>u</sup>p+1 ...up and <sup>z</sup>C <sup>=</sup> up-+1 ...um · <sup>t</sup> (<sup>P</sup> stands for prefix, <sup>R</sup> for repeat and <sup>C</sup> for continuation; we use capital letters to avoid confusion with indices). By denoting z- P <sup>=</sup> <sup>u</sup>- <sup>1</sup> ...u- p, z- R <sup>=</sup> <sup>u</sup>- p+1 ...u- p- , z-- P <sup>=</sup> <sup>u</sup>-- <sup>1</sup> ...u-- p, <sup>z</sup>-- R <sup>=</sup> <sup>u</sup>-- p+1 ...u-- p and z-- C <sup>=</sup> <sup>u</sup>-- p-+1 ...u-- m ·t -- the corresponding images, <sup>z</sup> <sup>=</sup> <sup>z</sup>P · <sup>z</sup>R<sup>ω</sup> is a point of discontinuity. Indeed, define (zn)n∈<sup>N</sup> as, for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>, <sup>z</sup>n <sup>=</sup> <sup>z</sup>P · <sup>z</sup><sup>n</sup> R · <sup>z</sup><sup>C</sup> . Then, (zn)n∈<sup>N</sup> converges towards <sup>z</sup>, but, since for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>, <sup>f</sup>(zn) = <sup>z</sup>-- P · <sup>z</sup>-- L <sup>n</sup> · z-- C , we have that <sup>f</sup>(zn) −−→n<sup>∞</sup> <sup>f</sup>(z) = <sup>z</sup>- P · <sup>z</sup>- <sup>ω</sup>, since mismatch(z- P , z-- P ).

L (b) Otherwise, by the same reasoning as above, it means there exists a repeating pair with only improductive transitions in between: there exist indices p and p such that l ≤ p<p- ≤ l - , (qf , μp)=(qf , μp- ), (qp, τp)=(qp- , τp- ), and (qf , μp) <sup>u</sup>p+1...up- |ε −−−−−−−→ (qf , μp- ), (qp, τp) <sup>u</sup>p+1...up- |ε −−−−−−−→ (qp- , τp- ). Then, by taking <sup>z</sup>P <sup>=</sup> <sup>u</sup><sup>1</sup> ...up, <sup>z</sup>R <sup>=</sup> <sup>u</sup>p+1 ...up and <sup>z</sup>C <sup>=</sup> <sup>u</sup>p-+1 ...um · <sup>t</sup>, we have, by letting z- P <sup>=</sup> <sup>u</sup>- <sup>1</sup> ...u- p, <sup>z</sup>- R <sup>=</sup> <sup>u</sup>- p+1 ...u- p- , z-- P <sup>=</sup> <sup>u</sup>-- <sup>1</sup> ...u-- p, <sup>z</sup>-- R <sup>=</sup> <sup>ε</sup> and z-- C <sup>=</sup> <sup>u</sup>-- n-+1 ...u-- m · <sup>t</sup> --, that <sup>z</sup> <sup>=</sup> <sup>z</sup>P · <sup>z</sup>R<sup>ω</sup> is a point of discontinuity. Indeed, define (zn)n∈<sup>N</sup> as, for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>, <sup>z</sup>n <sup>=</sup> <sup>z</sup>P · <sup>z</sup><sup>n</sup> R · <sup>z</sup><sup>C</sup> . Then, (zn)n∈<sup>N</sup> indeed converges towards <sup>z</sup>, but, since for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>, <sup>f</sup>(zn) = <sup>z</sup>-- P · <sup>z</sup>-- C , we have that <sup>f</sup>(zn) −−→n<sup>∞</sup> <sup>f</sup>(z) = <sup>z</sup>- P · <sup>z</sup>- R <sup>ω</sup>, since mismatch(z- P , z-- P · <sup>z</sup>-- C ) (the mismatch necessarily lies in z- P , since z- P ≥ <sup>i</sup> + 1).

**Corollary 20.** Deciding whether an NRT defines a continuous function is PSpace-complete.

Proof. Let X ⊆ D be a set of size 2k + 3 containing d0. By Theorem 19, T is not continuous iff it is not continuous at some <sup>z</sup> <sup>∈</sup> (<sup>Σ</sup> <sup>×</sup> <sup>X</sup>)ω, iff -<sup>T</sup> <sup>∩</sup> (Σ × X)<sup>ω</sup> × (Γ × X)ω is not continuous. By Proposition 3, such relation is recognisable by a finite transducer <sup>T</sup>X with <sup>O</sup>(|Q|×|X<sup>|</sup> <sup>|</sup>R<sup>|</sup> ) states, which can be built on-the-fly. By [3], the continuity of functions defined by NFT is decidable in NLogSpace, which yields a PSpace procedure.

For the hardness, we reduce again from the emptiness problem of register automata, which is PSpace-complete [4]. Let A be a register automaton over some alphabet Σ × D. We construct a transducer T which defines a continuous function iff L(A) = ∅ iff the domain of T is empty. Let f be a non-continous function realised by some NRT H (it exists by Example 16). Then, let # ∈ Σ be a fresh symbol, and define the function g as the function mapping any data word of the form w(#, d)w to w(#, d)f(w- ) if w ∈ L(A). The function g is realised by an NRT which simulates A and copies its inputs on the output to implement the identity, until it sees #. If it was in some accepting state of A before seeing #, it branches to some initial state of H and proceeds executing H. If there is some <sup>w</sup><sup>0</sup> <sup>∈</sup> <sup>L</sup>(A), then the subfunction <sup>g</sup>w<sup>0</sup> mapping words of the form <sup>w</sup>0(#, d)w- to w0(#, d)f(w- ) is not continuous, since f is not. Hence g is not continuous. Conversely, if <sup>L</sup>(A) = <sup>∅</sup>, then dom(g) = <sup>∅</sup>, so <sup>g</sup> is continuous.

In [3], non-continuity is characterised by a specific pattern (Lemma 21, Figure 1), i.e. the existence of some particular sequence of transitions. By applying this characterisation to the finite transducer recognising -<sup>T</sup>∩((<sup>Σ</sup> <sup>×</sup>X)<sup>ω</sup> <sup>×</sup>(<sup>Γ</sup> <sup>×</sup>X)ω), as constructed in Proposition 3, we can characterise non-continuity by a similar pattern, which will prove useful to decide (non-)continuity of test-free NRT in NLogSpace (cf Section 5):

**Corollary 21 ([3]).** Let T be an NRT with k registers. Then, for all X ⊆ D such that <sup>|</sup>X| ≥ <sup>2</sup><sup>k</sup> + 3 and <sup>d</sup><sup>0</sup> <sup>∈</sup> <sup>X</sup>, <sup>T</sup> is not continuous at some <sup>x</sup> <sup>∈</sup> (<sup>Σ</sup> × D)<sup>ω</sup> if and only if it has the pattern of Figure 5.

**Fig. 5.** A pattern characterising non-continuity of functions definable by an NRT: we ask that there exist configurations (q<sup>f</sup> , μ) and (q, τ ), where q<sup>f</sup> is accepting, as well as finite input data words u, v, finite output data words u , v , u, v, and an infinite input data word w admitting an accepting run from configuration (q, τ ) producing output w, such that mismatch(u , u) <sup>∨</sup> (v <sup>=</sup> <sup>ε</sup> <sup>∧</sup> mismatch(u , uw)).

## **5 Test-free Register Transducers**

In [7], we introduced a restriction which allows to recover decidability of the bounded synthesis problem for specifications expressed as non-deterministic register automata. Applied to transducers, such restriction also yields polynomial complexities when considering the functionality and computability problems.

An NRT T is test-free when its transition function does not depend on the tests conducted over the input data. Formally, we say that T is test-free if for all transitions q σ,φ|asgn,o −−−−−−→ T <sup>q</sup> we have φ = . Thus, we can omit the tests altogether and its transition relation can be represented as Δ-⊆ Q×Σ ×2<sup>R</sup> ×(Γ ×R)<sup>∗</sup> ×Q.

Example 22. Consider the function f : (Σ × D)<sup>ω</sup> → (Γ × D)<sup>ω</sup> associating, to x = (σ1, d1)(σ2, d2)... , the value (σ1, d1)(σ2, d1)(σ3, d1)... if there are infinitely many a in x, and (σ1, d2)(σ2, d2)(σ3, d2)... otherwise.

f can be implemented using a test-free NRT with one register: it initially guesses whether there are infinitely many a in x, if it is the case, it stores d<sup>1</sup> in the single register r, otherwise it waits for the next input to get d<sup>2</sup> and stores it in <sup>r</sup>. Then, it outputs the content of <sup>r</sup> along with each <sup>σ</sup>i. <sup>f</sup> is not continuous, as even outputting the first data requires reading an infinite prefix when d<sup>1</sup> = d2.

Note that when a transducer is test-free, the existence of an accepting run over a given input x only depends on its finite labels. Hence, the existence of two outputs y and z which mismatch over data can be characterised by a simple pattern (Figure 6), which allows to decide functionality in polynomial time:

## **Theorem 23.** Deciding whether a test-free NRT is functional is in PTime.

Proof. Let T be a test-free NRT such that T is not functional. Then, there exists <sup>x</sup> <sup>∈</sup> (<sup>Σ</sup> × D)ω, y, z <sup>∈</sup> (<sup>Γ</sup> × D)<sup>ω</sup> such that (x, y),(x, z) <sup>∈</sup> -<sup>T</sup> and <sup>y</sup> <sup>=</sup> <sup>z</sup>. Then, let i be such that y[i] = z[i]. There are two cases. Either lab(y[i]) = lab(z[i]), which means that the finite transducer T obtained by ignoring the registers of T is not functional. By Proposition 8, such property can be decided in NLogSpace, so let us focus on the second case: dt(y[i]) = dt(z[i]).

**Fig. 6.** A situation characterising the existence of a mismatch over data. Since acceptance does not depend on data, we can always choose <sup>x</sup> such that dt(x[j]) <sup>=</sup> dt(x[j ]). Here, we assume that the labels of x, y and z range over a unary alphabet; in particular y[i] = x[j] iff dt(y[i]) = dt(x[j]). Finally, for readability, we did not write that r should not be reassigned between j and l . Note that the position of i with regards to j, j , l and l does not matter; nor does the position of l w.r.t. l .

We here give a sketch of the proof: observe that an input x admits two outputs which mismatch over data if and only if it admits two runs which respectively store x[j] and x[j- ] such that x[j] = x[j- ] and output them later at the same output position i; the outputs y and z are then such that dt(y[i]) = dt(z[i]). Since T is test-free, the existence of two runs over the same input x only depends on its finite labels. Then, the registers containing respectively x[j] and x[j- ] should not be reassigned before being output, and should indeed output their content at the same position i (cf Figure 6). Besides, again because of test-freeness, we can always assume that x is such that x[j] = x[j- ]. Overall, such pattern can be checked by a 2-counter Parikh automaton, whose emptiness is decidable in PTime [8] (under conditions that are satisfied here).

Now, let us move to the case of continuity. Here again, the fact that test-free NRT conduct no test over the input data allows to focus on the only two registers that are responsible for the mismatch, the existence of an accepting run being only determined by finite labels.

**Theorem 24.** Deciding whether a test-free NRT defines a continuous function is in PTime.

Proof. Let T be a test-free NRT. First, it can be shown that T is continuous if and only if T has the pattern of Figure 7, where r is coaccessible (since acceptance only depends on finite labels, T can be trimmed<sup>3</sup> in polynomial time).

**Fig. 7.** A pattern characterising non-continuity of functions defined by NRT, where we ask that there exist some states q<sup>f</sup> , q and r, where q<sup>f</sup> is accepting, as well as finite input data words u, v, z and finite output data words u , v , u, v, z such that mismatch(u , u)∨(v <sup>=</sup> <sup>ε</sup> <sup>∧</sup> mismatch(u , uz)). Register assignments are not depicted, as there are no conditions on them. We unrolled the loops to highlight the fact that they do not necessarily loop back to the same configuration.

Now, it remains to show that such simpler pattern can be checked in PTime. We treat each part of the disjunction separately:


q <sup>z</sup>|z--−−−→ <sup>r</sup>, where <sup>q</sup>f <sup>∈</sup> <sup>F</sup> and mismatch(u- , u-z--). By examining again the proof of Theorem 23, it can be shown that to obtain a mismatch, it suffices that the input is the same for both runs only up to position max(j, j- ). More precisely, there is a mismatch between u and u-z- if and only if there exists two registers r and r and two positions j, j- ∈ {1,..., u} such that j = j- , r is stored at position j, r is stored at position j- , r and r are respectively output at input positions l ∈ {1,..., u} and l - ∈ {1,..., uz} and they are not reassigned in the meantime. Again, such property, along with the fact that <sup>q</sup>f <sup>∈</sup> <sup>F</sup> and the existence of a synchronised loop can be checked by a 2-counter Parikh automaton of polynomial size.

Overall, deciding whether a test-free NRT is continuous is in PTime.

<sup>3</sup> We say that T is trim when all its states are both accessible and coaccessible.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Minimal Coverability Tree Construction Made Complete and Efficient** *-*

Alain Finkel<sup>1</sup>,<sup>3</sup>, Serge Haddad<sup>1</sup>,<sup>2</sup>, and Igor Khmelnitsky<sup>1</sup>,<sup>2</sup>(B)

<sup>1</sup> LSV, ENS Paris-Saclay, CNRS, Universit´e Paris-Saclay, Cachan, France {finkel,haddad,khmelnitsky}@lsv.fr <sup>2</sup> Inria, France <sup>3</sup> Institut Universitaire de France, France

**Abstract.** Downward closures of Petri net reachability sets can be finitely represented by their set of maximal elements called the minimal coverability set or Clover. Many properties (coverability, boundedness, ...) can be decided using Clover, in a time proportional to the size of Clover. So it is crucial to design algorithms that compute it efficiently. We present a simple modification of the original but incomplete Minimal Coverability Tree algorithm (MCT), computing Clover, which makes it complete: it memorizes accelerations and fires them as ordinary transitions. Contrary to the other alternative algorithms for which no bound on the size of the required additional memory is known, we establish that the additional space of our algorithm is at most doubly exponential. Furthermore we have implemented a prototype MinCov which is already very competitive: on benchmarks it uses less space than all the other tools and its execution time is close to the one of the fastest tool.

**Keywords:** Petri nets · Karp-Miller tree algorithm · Coverability · Minimal coverability set · Clover · Minimal coverability tree.

## **1 Introduction**

**Coverability and coverability set in Petri nets.** Petri nets are iconic as an infinite-state model used for verifying concurrent systems. Coverability, in Petri nets, is the most studied property for several reasons: (1) many properties like mutual exclusion, safety, control-state reachability reduce to coverability, (2) the coverability problem is EXPSPACE-complete (while reachability is non elementary), and (3) there exist efficient prototypes and numerous case studies. To solve the coverability problem, there are backward and forward algorithms. But these algorithms do not address relevant problems like the repeated coverability problem, the LTL model-checking, the boundedness problem and regularity of the traces.

However these problems are EXPSPACE-complete [4, 1] and are also decidable using the Karp-Miller tree algorithm (KMT) [11] that computes a finite tree

<sup>-</sup> The work was carried out in the framework of ReLaX, UMI2000 and also supported by ANR-17-CE40-0028 project BRAVAS.

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 237–256, 2020. https://doi.org/10.1007/978-3-030-45231-5\_13

labeled by a set of <sup>ω</sup>*-markings* <sup>C</sup> <sup>⊆</sup> <sup>N</sup><sup>P</sup> <sup>ω</sup> (where N<sup>ω</sup> is the set of naturals enlarged with an upper bound ω and P is the set of places) such that the reachability set and the finite set C have the same downward closure in N<sup>P</sup> . Thus a marking **m** is coverable if there exists some **m**- ≥ **m** with **m**- <sup>∈</sup> <sup>C</sup>. Hence, <sup>C</sup> can be seen as *one* among all the possible finite representations of the infinite downward closure of the reachability set. This set C allows, for instance, to solve multiple instances of coverability in linear time linear w.r.t. the size of C avoiding to call many times a costly algorithm. Informally the KMT algorithm builds a reachability tree but, in order to ensure termination, substitutes ω to some finite components of a marking of a vertex when some marking of an ancestor is smaller.

Unfortunately C may contain comparable markings while only the maximal elements are important. The set of maximal elements of C can be defined independently of the KMT algorithm and was called the *minimal coverability set (MCS)* in [6] and abbreviated as the *Clover* in the more general framework of Well Structured Transition Systems (WSTS) [7].

**The minimal coverability tree algorithm.** So in [5, 6] the author computes the minimal coverability set by modifying the KMT algorithm in such a way that at each step of the algorithm, the set of ω-markings labelling vertices is an antichain. But this aggressive strategy, implemented by the so-called Minimal Coverability Tree algorithm (MCT), contains a subtle bug and it may compute a strict under-approximation of Clover as shown in [8, 10].

**Alternative minimal coverability set algorithms.** Since the discovery of this bug, three algorithms (with variants) [10, 14, 13] have been designed for computing the minimal coverability set without building the full Karp-Miller tree. In [10] the authors proposed a minimal coverability set algorithm (called CovProc) that is not based on the Karp-Miller tree algorithm but uses a similar but restricted introduction of ω's. In [14], Reynier and Servais proposed a modification of the MCT, called the Monotone-Pruning algorithm (called MP), that keeps but "deactivates" vertices labeled with smaller ω-markings while MCT would have deleted them. Recently in [15], the authors simplified their original proof of correctness. In [16], Valmari and Hansen proposed another algorithm (denoted below as VH) for constructing the minimal coverability set without deleting vertices. Their algorithm builds a graph and not a tree as usual. In [13], Piipponen and Valmari improved this algorithm by designing appropriate data structures and heuristics for exploration strategy that may significantly decrease the size of the graph.

### **Our contributions.**

1. We introduce the concept of *abstraction* as an ω-transition that mimics the effect of an infinite family of firing sequences of markings w.r.t. coverability. As a consequence adding abstractions to the net does not modify its coverability set. Moreover, the classical Karp-Miller *acceleration* can be formalized as an abstraction whose incidence on places is either ω or null. The set of accelerations of a net is upward closed and well-ordered. Hence there exists a finite subset of minimal accelerations and we show that the size of all minimal acceleration is bounded by a double exponential.


**Organization.** Section 2 introduces abstractions and accelerations and studies their properties. Section 3 presents our algorithm and establishes its correctness. Section 4 describes our tool and discusses the results of the benchmarks. We conclude and give some perspectives to this work in Section 5. One can find all the missing proofs and an illustration of the behavior of the algorithm in [9].

## **2 Covering abstractions**

#### **2.1 Petri nets: reachability and covering**

Here we define Petri nets differently from the usual way but in an equivalent manner. i.e. based on the backward incidence matrix **Pre** and the incidence matrix **C**. The forward incidence matrix is implicitly defined by **C** + **Pre**. Such a choice is motivated by the introduction of abstractions in section 2.2.

**Definition 1.** *A Petri net (PN) is a tuple* <sup>N</sup> <sup>=</sup> P, T, **Pre**, **<sup>C</sup>** *where:*


*<sup>A</sup>* marked *Petri net* (<sup>N</sup> , **<sup>m</sup>**0) *is a Petri net* <sup>N</sup> *equipped with an initial marking* **<sup>m</sup>**<sup>0</sup> <sup>∈</sup> <sup>N</sup><sup>P</sup> *.*

The column vector of matrix **Pre** (resp. **<sup>C</sup>**) indexed by <sup>t</sup> <sup>∈</sup> <sup>T</sup> is denoted **Pre**(t) (resp. **<sup>C</sup>**(t)). A transition <sup>t</sup> <sup>∈</sup> <sup>T</sup> is *fireable* from a marking **<sup>m</sup>** <sup>∈</sup> <sup>N</sup><sup>P</sup> if **<sup>m</sup>** <sup>≥</sup> **Pre**(t). When t is fireable from **m**, its *firing* leads to marking **m** def = **m** + **C**(t), denoted by **<sup>m</sup>** <sup>t</sup> −→ **<sup>m</sup>**- . One extends fireability and firing to a sequence <sup>σ</sup> <sup>∈</sup> <sup>T</sup> <sup>∗</sup> by recurrence on its length. The empty sequence ε is always fireable and let the marking unchanged. Let σ = tσ be a sequence with <sup>t</sup> <sup>∈</sup> <sup>T</sup> and <sup>σ</sup>-<sup>∈</sup> <sup>T</sup> <sup>∗</sup>. Then <sup>σ</sup> is fireable from **<sup>m</sup>** if **<sup>m</sup>** <sup>t</sup> −→ **<sup>m</sup>** and σ is fireable from **m**- . The firing of σ from **m** leads to the marking **m**- reached by σ from **m**- . One also denotes this firing by **m** <sup>σ</sup> −→ **<sup>m</sup>**--.

**Definition 2.** *Let* (<sup>N</sup> , **<sup>m</sup>**0) *be a marked net. The* reachability set Reach(<sup>N</sup> , **<sup>m</sup>**0) *is defined by:*

$$\operatorname{Reach}(\mathcal{N}, \mathbf{m}\_0) = \{ \mathbf{m} \mid \exists \sigma \in T^\* \; \mathbf{m}\_0 \stackrel{\sigma}{\longrightarrow} \mathbf{m} \}$$

In order to introduce the coverability set of a Petri net, let us recall some definitions and results related to ordered sets. Let (X, <sup>≤</sup>) be an ordered set. The downward (resp. upward) *closure* of a subset <sup>E</sup> <sup>⊆</sup> <sup>X</sup> is denoted by <sup>↓</sup> <sup>E</sup> (resp. <sup>↑</sup>E) and defined by:

<sup>↓</sup><sup>E</sup> <sup>=</sup> {<sup>x</sup> <sup>∈</sup> <sup>X</sup> | ∃<sup>y</sup> <sup>∈</sup> E y <sup>≥</sup> <sup>x</sup>} (resp. <sup>↑</sup><sup>E</sup> <sup>=</sup> {<sup>x</sup> <sup>∈</sup> <sup>X</sup> | ∃<sup>y</sup> <sup>∈</sup> E y <sup>≤</sup> <sup>x</sup>})

A subset <sup>E</sup> <sup>⊆</sup> <sup>X</sup> is downward (resp. upward) *closed* if <sup>E</sup> <sup>=</sup>↓<sup>E</sup> (resp. <sup>E</sup> <sup>=</sup>↑E).

An *antichain* <sup>E</sup> is a set which fulfills: <sup>∀</sup><sup>x</sup> <sup>=</sup> <sup>y</sup> <sup>∈</sup> <sup>E</sup> <sup>¬</sup>(<sup>x</sup> <sup>≤</sup> <sup>y</sup> <sup>∨</sup> <sup>y</sup> <sup>≤</sup> <sup>x</sup>). <sup>X</sup> is said *FAC* (for Finite AntiChains) if all its antichains are finite. A non empty set <sup>E</sup> <sup>⊆</sup> <sup>X</sup> is *directed* if for all x, y <sup>∈</sup> <sup>E</sup> there exists <sup>z</sup> <sup>∈</sup> <sup>E</sup> such that <sup>x</sup> <sup>≤</sup> <sup>z</sup> and <sup>y</sup> <sup>≤</sup> <sup>z</sup>. An *ideal* is a set which is downward closed and directed. There exists an equivalent characterization of FAC sets which provides a finite description of any downward closed set: a set is FAC if and only if every downward closed set admits a finite decomposition in ideals (a proof of this well-known result can be found in [3]).

X is *well founded* if all its (strictly) decreasing sequences are finite. X is *well ordered* if it is FAC and well founded. There are many equivalent characterizations of well order. For instance, a set X is well ordered if and only if for all sequence (xn)<sup>n</sup>∈<sup>N</sup> in <sup>X</sup>, there exists a non decreasing infinite subsequence. This characterization allows to design algorithms that computes trees whose finiteness is ensured by well order. Let us recall that (N, <sup>≤</sup>) and (N<sup>P</sup> , <sup>≤</sup>) are well ordered sets.

We are now ready to introduce the *cover* (also called the coverability set) of a net and to state some of its properties.

**Definition 3.** *Let* (<sup>N</sup> , **<sup>m</sup>**0) *be a marked Petri net.* Cover(<sup>N</sup> , **<sup>m</sup>**0)*, its* coverability set*, is defined by:*

$$\operatorname{Cover}(\mathcal{N}, \mathbf{m}\_0) = \downarrow \operatorname{Reach}(\mathcal{N}, \mathbf{m}\_0).$$

Since the coverability set is downward closed and N<sup>P</sup> is FAC, it admits a finite decomposition in ideals. The ideals of N<sup>P</sup> can be defined in an elegant way as follows. One first extends the sets of naturals and integers: <sup>N</sup><sup>ω</sup> <sup>=</sup> <sup>N</sup> ∪ {ω} et <sup>Z</sup><sup>ω</sup> <sup>=</sup> <sup>Z</sup> ∪ {ω}. Then one extends the order relation and the addition to <sup>Z</sup>ω: for all <sup>n</sup> <sup>∈</sup> <sup>Z</sup>, ω>n and for all <sup>n</sup> <sup>∈</sup> <sup>Z</sup>ω, <sup>n</sup> <sup>+</sup> <sup>ω</sup> <sup>=</sup> <sup>ω</sup> <sup>+</sup> <sup>n</sup> <sup>=</sup> <sup>ω</sup>. <sup>N</sup><sup>P</sup> <sup>ω</sup> is also a well ordered set and its members are called ω*-markings*. There is a one-to-one mapping between ideals of <sup>N</sup><sup>P</sup> and <sup>ω</sup>-markings. Let **<sup>m</sup>** <sup>∈</sup> <sup>N</sup><sup>P</sup> <sup>ω</sup> . Define **<sup>m</sup>** by:

$$\mathbb{I}\left[\mathbf{m}\right] = \left\{\mathbf{m}' \in \mathbb{N}^P \mid \mathbf{m}' \le \mathbf{m}\right\},$$

**<sup>m</sup>** is an ideal of <sup>N</sup><sup>P</sup> (and all ideal can be defined in such a way). Let <sup>Ω</sup> be a set of <sup>ω</sup>-markings, -<sup>Ω</sup> denotes the set - **<sup>m</sup>**∈<sup>Ω</sup>**<sup>m</sup>**. Due to the above properties, there exists a unique finite set with minimal size Clover(<sup>N</sup> , **<sup>m</sup>**0) <sup>⊆</sup> <sup>N</sup><sup>p</sup> <sup>ω</sup> such that:

> Cover(<sup>N</sup> , **<sup>m</sup>**0) = -Clover(<sup>N</sup> , **<sup>m</sup>**0)

A more general result can be found in [3] for well structured transition systems.

*Example 1.* The marked net of Figure 1 is unbounded. Its Clover is the following set:

$$\{p\_i, p\_{bk} + p\_m, p\_l + p\_m + \omega p\_{ba}, p\_l + p\_{bk} + \omega p\_{ba} + \omega p\_c\}$$

For instance, the marking pl+pbk+αpba+βp<sup>c</sup> is reached thus covered by sequence t1t α+β <sup>5</sup> t β 6 .

**Fig. 1.** An unbounded Petri net

#### **2.2 Abstraction and acceleration**

In order to introduce abstractions and accelerations, we generalize the transitions to allow the capability to mark a place with ω tokens.

**Definition 4.** *Let* P *be a set of places. An* ω*-transition* **a** *is defined by:*


For sake of homogeneity, one denotes **Pre**(**a**)(p) (resp. **C**(**a**)(p)) by **Pre**(p, **a**) (resp. **<sup>C</sup>**(p, **<sup>a</sup>**)). An <sup>ω</sup>-transition **<sup>a</sup>** is fireable from an <sup>ω</sup>-marking **<sup>m</sup>** <sup>∈</sup> <sup>N</sup><sup>P</sup> <sup>ω</sup> if **<sup>m</sup>** <sup>≥</sup> **Pre**(**a**). When **<sup>a</sup>** is fireable from **<sup>m</sup>**, its firing leads to the <sup>ω</sup>-marking **<sup>m</sup>** def = **<sup>m</sup>** <sup>+</sup> **<sup>C</sup>**(**a**), denoted as previously **<sup>m</sup> <sup>a</sup>**−→ **<sup>m</sup>**- . One observes that if **Pre**(p, **a**) = ω then for all values of **C**(p, **a**), **m**- (**a**) = ω. So without loss of generality, one assumes that for all ω-transition **a**, **Pre**(p, **a**) = ω implies **C**(p, **a**) = ω.

In order to define abstractions, we first define the incidences of a sequence σ of ω-transitions by recurrence on its length. As previously, we denote **Pre**(p, σ) def =

**Pre**(σ)(p) and **C**(p, σ) def = **C**(σ)(p). The base case corresponds to the definition of an ω-transition. Let σ = tσ- , with t an ω-transition and σ a sequence of ω-transitions, then:

$$\begin{array}{l} -\mathbf{C}(\sigma) = \mathbf{C}(t) + \mathbf{C}(\sigma');\\ -\text{ for all } p \in P \\ \bullet \text{ if } \mathbf{C}(p, t) = \omega \text{ then } \mathbf{Pre}(p, \sigma) = \mathbf{Pre}(p, t);\\ \bullet \text{ else } \mathbf{Pre}(p, \sigma) = \max(\mathbf{Pre}(p, t), \mathbf{Pre}(p, \sigma') - \mathbf{C}(p, t)). \end{array}$$

One checks by recurrence that <sup>σ</sup> is firable from **<sup>m</sup>** if and only if **<sup>m</sup>** <sup>≥</sup> **Pre**(σ) and in this case, **m** <sup>σ</sup> −→ **<sup>m</sup>** <sup>+</sup> **<sup>C</sup>**(σ).

An *abstraction* of a net is an ω-transition which concisely expresses the behaviour of the net w.r.t. covering (see Proposition 1). One will observe that a transition t of a net is by construction (with σ<sup>n</sup> = t for all n) an abstraction.

**Definition 5.** *Let* <sup>N</sup> <sup>=</sup> P, T, **Pre**, **<sup>C</sup>** *be a Petri net and* **<sup>a</sup>** *be an* <sup>ω</sup>*-transition.* **<sup>a</sup>** *is an* abstraction *if for all* <sup>n</sup> <sup>≥</sup> <sup>0</sup>*, there exists* <sup>σ</sup><sup>n</sup> <sup>∈</sup> <sup>T</sup> <sup>∗</sup> *such that for all* <sup>p</sup> <sup>∈</sup> <sup>P</sup> *with* **Pre**(p, **<sup>a</sup>**) <sup>∈</sup> <sup>N</sup>*:*


The following proposition justifies the interest of abstractions.

**Proposition 1.** *Let* (<sup>N</sup> , **<sup>m</sup>**0) *be a marked Petri net,* **<sup>a</sup>** *be an abstraction and* **<sup>m</sup>** *be an* <sup>ω</sup>*-marking such that:* **<sup>m</sup>** <sup>⊆</sup> Cover(<sup>N</sup> , **<sup>m</sup>**0) *and* **<sup>m</sup> <sup>a</sup>**−→ **<sup>m</sup>**- *. Then* **m**- ⊆ Cover(<sup>N</sup> , **<sup>m</sup>**0)*.*

**Proof.** Pick some **<sup>m</sup>**<sup>∗</sup> <sup>∈</sup> **m**- . Denote <sup>n</sup> = max(**m**∗(p) <sup>|</sup> **<sup>m</sup>**- (p) = ω) and = max(**Pre**(p, σn), n <sup>−</sup> **<sup>C</sup>**(p, σn) <sup>|</sup> **<sup>m</sup>**(p) = <sup>ω</sup>). Let us define **<sup>m</sup>** <sup>∈</sup> **<sup>m</sup>** by:


Let us check that <sup>σ</sup><sup>n</sup> is fireable from **<sup>m</sup>**. Let <sup>p</sup> <sup>∈</sup> <sup>P</sup>,


Let us show that **<sup>m</sup>** <sup>+</sup> **<sup>C</sup>**(σn) <sup>≥</sup> **<sup>m</sup>**∗. Let <sup>p</sup> <sup>∈</sup> <sup>P</sup>,


An easy way to build new abstractions consists in concatenating them.

**Proposition 2.** *Let* <sup>N</sup> <sup>=</sup> P, T, **Pre**, **<sup>C</sup>** *be a Petri net and* <sup>σ</sup> *be a sequence of abstractions. Then the* ω*-transition* **a** *defined by* **Pre**(**a**) = **Pre**(σ) *and* **C**(**a**) = **C**(σ) *is an abstraction.*

We now introduce the underlying concept of the Karp and Miller construction.

**Definition 6.** *Let* <sup>N</sup> <sup>=</sup> P, T, **Pre**, **<sup>C</sup>** *be a Petri net. One says that* **<sup>a</sup>** *is an* acceleration *if* **<sup>a</sup>** *is an abstraction such that* **<sup>C</sup>**(**a**) ∈ {0, ω}<sup>P</sup> *.*

The following proposition provides a way to get an acceleration from an arbitrary abstraction.

**Proposition 3.** *Let* <sup>N</sup> <sup>=</sup> P, T, **Pre**, **<sup>C</sup>** *be a Petri net and* **<sup>a</sup>** *be an abstraction. Define* **a***an* <sup>ω</sup>*-transition as follows. For all* <sup>p</sup> <sup>∈</sup> <sup>P</sup>*:*

**–** *If* **C**(p, **a**) < 0 *then* **Pre**(p, **a**- ) = **C**(p, **a**- ) = ω*;* **–** *If* **C**(p, **a**)=0 *then* **Pre**(p, **a**- ) = **Pre**(p, **a**) *and* **C**(p, **a**- )=0*;* **–** *If* **C**(p, **a**) > 0 *then* **Pre**(p, **a**- ) = **Pre**(p, **a**) *and* **C**(p, **a**- ) = ω*.*

*Then* **a***is an acceleration.*

Let us study more deeply the set of accelerations. First we equip the set of ω-transitions with a"natural" order w.r.t. covering.

**Definition 7.** *Let* P *be a set of places and two* ω*-transitions* **a** *and* **a**- *.*

$$\mathbf{a} \le \mathbf{a}' \text{ if } \operatorname{and} \operatorname{only} \operatorname{if} \mathbf{Pre}(\mathbf{a}) \le \mathbf{Pre}(\mathbf{a}') \land \mathbf{C}(\mathbf{a}) \ge \mathbf{C}(\mathbf{a}')$$

In other words, **a** ≤ **a** if given any ω-marking **m**, if **a** is fireable from **m** then **a** is also fireable and its firing leads to a marking greater or equal that the one reached by the firing of **a**- .

**Proposition 4.** *Let* N *be a Petri net. Then the set of abstractions of* N *is upward closed. Similarly, the set of accelerations is upward closed in the set of* <sup>ω</sup>*-transitions whose incidence belongs to* {0, ω}<sup>P</sup> *.*

**Proposition 5.** *The set of accelerations of a Petri net is well ordered.*

**Proof.** The set of accelerations is a subset of <sup>N</sup><sup>P</sup> × {0, ω}<sup>P</sup> (where <sup>P</sup> is the set of places) with the order obtained by iterating cartesian products of sets (N, <sup>≤</sup>) and ({0, ω}, <sup>≥</sup>). These sets are well ordered and the cartesian product preserves this property. So we are done.

Since the set of accelerations is well ordered and it is upward closed, it is equal to the upward closure of the finite set of *minimal* accelerations. Let us study the size of a minimal acceleration. Given some Petri net, one denotes <sup>d</sup> <sup>=</sup> <sup>|</sup>P<sup>|</sup> and e = maxp,t(max(**Pre**(p, t), **Pre**(p, t) + **C**(p, t)).

We are going to use the following result of J´erˆome Leroux (published on HAL in June 2019) which provides a bound for the lengths of shortest sequences between two markings **m**<sup>1</sup> and **m**<sup>2</sup> mutually reachable.

**Theorem 1.** *(Theorem 2, [12]) Let* <sup>N</sup> *be a Petri net,* **<sup>m</sup>**1, **<sup>m</sup>**<sup>2</sup> *be markings,* σ1, σ<sup>2</sup> *be sequences of transitions such that* **m**<sup>1</sup> <sup>σ</sup><sup>1</sup> −→ **<sup>m</sup>**<sup>2</sup> <sup>σ</sup><sup>2</sup> −→ **<sup>m</sup>**1*. Then there exist* σ- 1, σ- <sup>2</sup> *such that* **m**<sup>1</sup> σ- <sup>1</sup> −→ **m**<sup>2</sup> σ- <sup>2</sup> −→ **m**<sup>1</sup> *fulfilling:*

> |σ- 1σ- <sup>2</sup>| ≤ ||**m**<sup>1</sup> <sup>−</sup> **<sup>m</sup>**2||∞(3de)(d+1)2d+4

One deduces an upper bound on the size of minimal accelerations. Let **<sup>v</sup>** <sup>∈</sup> <sup>N</sup><sup>P</sup> <sup>ω</sup> . One denotes ||**v**||<sup>∞</sup> = max(**v**(p) <sup>|</sup> **<sup>v</sup>**(p) <sup>∈</sup> <sup>N</sup>).

**Proposition 6.** *Let* N *be a Petri net and* **a** *be a minimal acceleration. Then* ||**Pre**(**a**)||<sup>∞</sup> <sup>≤</sup> <sup>e</sup>(3de)(d+1)2d+4 *.*

**Proof.** Let us consider the net N - <sup>=</sup> P- , T- , **Pre**- , **C**- obtained from N by deleting the set of places {<sup>p</sup> <sup>|</sup> **Pre**(p, **<sup>a</sup>**) = <sup>ω</sup>} and adding the set of transitions <sup>T</sup><sup>1</sup> <sup>=</sup> {t<sup>p</sup> <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>P</sup>- } with **Pre**(tp) = <sup>p</sup> et **<sup>C</sup>**(tp) = <sup>−</sup>p. Observe that <sup>d</sup>- <sup>≤</sup> <sup>d</sup> and e-= e.

One denotes <sup>P</sup><sup>1</sup> <sup>=</sup> {<sup>p</sup> <sup>|</sup> **Pre**(p, **<sup>a</sup>**) < ω <sup>=</sup> **<sup>C</sup>**(p, **<sup>a</sup>**)}. One introduces **<sup>m</sup>**<sup>1</sup> the marking obtained by restricting **Pre**(**a**) to P and **m**<sup>2</sup> = **m**<sup>1</sup> + <sup>p</sup>∈P<sup>1</sup> <sup>p</sup>.

Let {σ<sup>n</sup>}<sup>n</sup>∈<sup>N</sup> be a family of sequences associated with **<sup>a</sup>**. Let <sup>n</sup><sup>∗</sup> <sup>=</sup> ||**Pre**(**a**)||∞+1. Then <sup>σ</sup><sup>n</sup><sup>∗</sup> is fireable in <sup>N</sup> from **m**<sup>1</sup> and its firing leads to a marking that covers **m**2. By concatenating some occurrences of transitions of T1, one gets a firing sequence in N **<sup>m</sup>**<sup>1</sup> <sup>σ</sup><sup>1</sup> −→ **<sup>m</sup>**2. Using the same process, one gets a firing sequence **<sup>m</sup>**<sup>2</sup> <sup>σ</sup><sup>2</sup> −→ **<sup>m</sup>**1.

Let us apply Theorem 1. There exists a sequence σ- <sup>1</sup> with **<sup>m</sup>**<sup>1</sup> <sup>σ</sup>- <sup>1</sup> −→ **<sup>m</sup>**<sup>2</sup> and <sup>|</sup>σ- <sup>1</sup>| ≤ (3de)(d+1)2d+4 since ||**m**1−**m**2||<sup>∞</sup> = 1. By deleting the transitions of <sup>T</sup><sup>1</sup> occurring in σ- <sup>1</sup>, one gets a sequence σ-- <sup>1</sup> <sup>∈</sup> <sup>T</sup> <sup>∗</sup> such that **<sup>m</sup>**<sup>1</sup> <sup>σ</sup>-- <sup>1</sup> −→ **<sup>m</sup>**- <sup>2</sup> <sup>≥</sup> **<sup>m</sup>**<sup>2</sup> with <sup>|</sup>σ-- <sup>1</sup> | ≤ (3de)(d+1)2d+4 .

The ω-transition **a**- , defined by **Pre**(p, **a**- ) = **Pre**(p, σ-- <sup>1</sup> ) for all <sup>p</sup> <sup>∈</sup> <sup>P</sup>- , **Pre**(p, **a**- ) = <sup>ω</sup> for all <sup>p</sup> <sup>∈</sup> <sup>P</sup> \ <sup>P</sup> and **C**(**a**- ) = **C**(**a**), is an acceleration whose associated family is {σ-- 1 n}<sup>n</sup>∈<sup>N</sup>. By definition of **m**1, **a**- ≤ **a**. Since **a** is minimal, **a**- = **a**. Observing that <sup>|</sup>σ-- <sup>1</sup> | ≤ (3de)(d+1)2d+4 , one gets ||**Pre**(**a**)||<sup>∞</sup> = ||**Pre**(**a**- )||<sup>∞</sup> ≤ e(3de)(d+1)2d+4 .

Thus given any acceleration, one can easily obtain a smaller acceleration whose (representation) size is exponential.

**Proposition 7.** *Let* N *be a Petri net and* **a** *be an acceleration. Then the* ω*-transition* trunc(**a**) *defined by:*

**– C**(trunc(**a**)) = **C**(**a**)*;*

**–** *for all* <sup>p</sup> *such that* **Pre**(p, **<sup>a</sup>**) <sup>=</sup> <sup>ω</sup>*,*

**Pre**(p, trunc(**a**)) = min(**Pre**(p, **a**), e(3de)(d+1)2d+4 ) *;*

**–** *for all* p *such that* **Pre**(p, **a**) = ω*,* **Pre**(p, trunc(**a**)) = ω*.*

*is an acceleration.*

**Proof.** Let **a**- <sup>≤</sup> **<sup>a</sup>**, be a minimal acceleration. For all <sup>p</sup> such that **Pre**(p, **<sup>a</sup>**) <sup>=</sup> <sup>ω</sup>, Pre(p, **a**- ) <sup>≤</sup> <sup>e</sup>(3de)(d+1)2d+4 . So **<sup>a</sup>**- <sup>≤</sup> trunc(**a**). Since the set of accelerations is upward closed, one gets that trunc(**a**) is an acceleration.

## **3 A coverability tree algorithm**

#### **3.1 Specification and illustration**

As discussed in the introduction, to compute the clover of a Petri net, most algorithms build coverability trees (or graphs), which are variants of the Karp and Miller tree with the aim of reducing the peak memory during the execution. The seminal algorithm [6] is characterized by a main difference with the KMT construction: when finding that the marking associated with the current vertex strictly covers the marking of another vertex, it deletes the subtree issued from this vertex, and when the current vertex belonged to the removed subtree it substitutes it to the root of the deleted subtree. This operation drastically reduces the peak memory but as shown in [8] entails incompleteness of the algorithm.

Like the previous algorithms that ensure completeness with deletions, our algorithm also needs additional memory. However unlike the other algorithms, it memorizes accelerations instead of ω-markings. This approach has two advantages. First, we are able to exhibit a theoretical upper bound on the additional memory which is doubly exponential, while the other algorithms do not have such a bound. Furthermore, accelerations are reused in the construction and thus may even shorten the execution time and peak space w.r.t. the algorithm in [6].

Before we delve into a high level description of this algorithm, let us present some of the variables, functions, and definitions used by the algorithm. Algorithm 1, denoted from now on as <sup>M</sup>inCov takes as an input a marked net (<sup>N</sup> , **<sup>m</sup>**0) and constructs a directed labeled tree CT = (V, E, λ, δ), and a set Acc of ωtransitions (which by Lemma 2 are accelerations). Each <sup>v</sup> <sup>∈</sup> <sup>V</sup> is labeled by an <sup>ω</sup>-marking, <sup>λ</sup>(v) <sup>∈</sup> <sup>N</sup><sup>P</sup> <sup>ω</sup> . Since CT is a directed tree, every vertex <sup>v</sup> <sup>∈</sup> <sup>V</sup> , has a predecessor (except the root r) denoted by prd(v) and a set of descendants denoted by Des(v). By convention, prd(r) = <sup>r</sup>. Each edge <sup>e</sup> <sup>∈</sup> <sup>E</sup> is labeled by a firing sequence <sup>δ</sup>(e) <sup>∈</sup> <sup>T</sup><sup>o</sup> ·Acc∗, consisting of an ordinary transition followed by a sequence of accelerations (which by Lemma 1 fulfills <sup>λ</sup>(prd(v)) <sup>δ</sup>(prd(v),v) −−−−−−−→ <sup>λ</sup>(v)). In addition, again by Lemma 1, **m**<sup>0</sup> δ(r,r) −−−→ <sup>λ</sup>(r). Let <sup>γ</sup> <sup>=</sup> <sup>e</sup>1e<sup>2</sup> ...e<sup>k</sup> <sup>∈</sup> <sup>E</sup><sup>∗</sup> be a path in the tree, we denote by <sup>δ</sup>(γ) := <sup>δ</sup>(e1)δ(e2)...δ(ek) <sup>∈</sup> (<sup>T</sup> <sup>∪</sup> Acc)∗. The subset Front <sup>⊂</sup> <sup>V</sup> is the set of vertices 'to be processed'.

MinCov may call function Delete(v) that removes from V a leaf v of CT and function Prune(v) that removes from <sup>V</sup> all descendants of <sup>v</sup> <sup>∈</sup> <sup>V</sup> except <sup>v</sup> itself as illustrated in the following figure:

First MinCov does some initializations, and sets the tree CT to be a single vertex <sup>r</sup> with marking <sup>λ</sup>(r) = **<sup>m</sup>**<sup>0</sup> and Front <sup>=</sup> {r}. Afterwards the main loop builds the tree, where each iteration consists in processing some vertex in Front as follows.

<sup>M</sup>inCov picks a vertex <sup>u</sup> <sup>∈</sup> Front (line 3). From <sup>λ</sup>(u), <sup>M</sup>inCov fires a sequence <sup>σ</sup> <sup>∈</sup> Acc<sup>∗</sup> reaching some **<sup>m</sup>**<sup>u</sup> that maximizes the number of <sup>ω</sup> produced, i.e. |{<sup>p</sup> <sup>∈</sup> <sup>P</sup> <sup>|</sup> <sup>λ</sup>(u)(p) <sup>=</sup> <sup>ω</sup> <sup>∧</sup> **<sup>m</sup>**u(p) = <sup>ω</sup>}|. Thus in <sup>σ</sup>, no acceleration occurs twice and its length is bounded by <sup>|</sup>P|. Then <sup>M</sup>inCov updates <sup>λ</sup>(u) with **<sup>m</sup>**<sup>u</sup> (line 5) and the label of the edge incoming to u by concatenating σ. Afterwards it performs one of the following actions according to the marking λ(u):


For a detailed example of a run of the algorithm see Example 2 in [9].

## **3.2 Correctness Proof**

We now establish the correctness of Algorithm 1 by proving the following properties (where for all <sup>W</sup> <sup>⊆</sup> <sup>V</sup> , <sup>λ</sup>(W) denotes - <sup>v</sup>∈<sup>W</sup> <sup>λ</sup>(v)):


We get termination by using the well order of N<sup>P</sup> <sup>ω</sup> and Koenig Lemma.

### **Proposition 8.** M*inCov terminates.*

**Proof.** Consider the following variation of the algorithm.

Instead of deleting the current vertex when its marking is smaller or equal than the marking of a vertex, one marks it as 'cut' and extract it from Front.

Instead of cutting a subtree when the marking of the current vertex v is greater than the marking of a vertex which is not an ancestor of v, one marks them as 'cut' and extract from Front those who are inside.

Instead of cutting a subtree when the marking of the current vertex v is greater than the marking of a vertex which is an ancestor of v, say v∗, one marks those on the path from v<sup>∗</sup> to v (except v) as 'accelerated', one marks the other vertices

**Algorithm 1:** Computing the minimal coverability set

MinCov(<sup>N</sup> , **<sup>m</sup>**0) **Input:** A marked Petri net (<sup>N</sup> , **<sup>m</sup>**0) **Data:** <sup>V</sup> set of vertices; <sup>E</sup> <sup>⊆</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> ; Front <sup>⊆</sup> <sup>V</sup> ; <sup>λ</sup> : <sup>V</sup> <sup>→</sup> <sup>N</sup><sup>p</sup> <sup>ω</sup>; <sup>δ</sup> : <sup>E</sup> <sup>→</sup> <sup>T</sup>oAcc∗; CT = (V, E, λ, δ) a labeled tree; Acc a set of ω-transitions; **Output:** A labeled tree CT = (V, E, λ, δ) **<sup>1</sup>** <sup>V</sup> ← {r}; <sup>E</sup> ← ∅; Front ← {r}; <sup>λ</sup>(r) <sup>←</sup> **<sup>m</sup>**0; Acc ← ∅; <sup>δ</sup>(r, r) <sup>←</sup> <sup>ε</sup> **<sup>2</sup> while** F ront <sup>=</sup> <sup>∅</sup> **do <sup>3</sup>** Select <sup>u</sup> <sup>∈</sup> Front **<sup>4</sup>** Let <sup>σ</sup> <sup>∈</sup> Acc<sup>∗</sup> a maximal fireable sequence of accelerations from <sup>λ</sup>(u) // Maximal w.r.t. the number of ω's produced **<sup>5</sup>** <sup>λ</sup>(u) <sup>←</sup> <sup>λ</sup>(u) + **<sup>C</sup>**(σ) **<sup>6</sup>** <sup>δ</sup>((prd(u), u)) <sup>←</sup> <sup>δ</sup>((prd(u), u)) · <sup>σ</sup> **<sup>7</sup> if** <sup>∃</sup>u <sup>∈</sup> <sup>V</sup> \ Front s.t. <sup>λ</sup>(u ) <sup>≥</sup> <sup>λ</sup>(u) **then** Delete(u) // <sup>λ</sup>(u) is covered **<sup>8</sup> else if** <sup>∃</sup>u <sup>∈</sup> Anc(<sup>V</sup> ) s.t. <sup>λ</sup>(u) > λ(u ) **then** // An acceleration was found between u and one of u's ancestors **<sup>9</sup>** Let <sup>γ</sup> <sup>∈</sup> <sup>E</sup><sup>∗</sup> the path from <sup>u</sup> to <sup>u</sup> in CT **<sup>10</sup> a** ← NewAcceleration() **<sup>11</sup> foreach** <sup>p</sup> <sup>∈</sup> <sup>P</sup> **do <sup>12</sup> if C**(p, δ(γ)) <sup>&</sup>lt; <sup>0</sup> **then Pre**(p, **<sup>a</sup>**) <sup>←</sup> <sup>ω</sup>; **<sup>C</sup>**(p, **<sup>a</sup>**) <sup>←</sup> <sup>ω</sup> **<sup>13</sup> if C**(p, δ(γ)) = 0 **then Pre**(p, **<sup>a</sup>**) <sup>←</sup> **Pre**(p, δ(γ)); **<sup>C</sup>**(p, **<sup>a</sup>**) <sup>←</sup> <sup>0</sup> **<sup>14</sup> if C**(p, δ(γ)) <sup>&</sup>gt; <sup>0</sup> **then Pre**(p, **<sup>a</sup>**) <sup>←</sup> **Pre**(p, δ(γ)); **<sup>C</sup>**(p, **<sup>a</sup>**) <sup>←</sup> <sup>ω</sup> **15 end <sup>16</sup> <sup>a</sup>** <sup>←</sup> trunc(**a**); Acc <sup>←</sup> Acc ∪ {**a**}; Prune(u ); Front <sup>=</sup> Front ∪ {u } ; **17 else <sup>18</sup> for** <sup>u</sup> <sup>∈</sup> <sup>V</sup> **do** // Remove vertices labeled by markings covered by λ(u) **<sup>19</sup> if** λ(u ) < λ(u) **then** Prune(u ); Delete(u ) **20 end <sup>21</sup>** Front <sup>←</sup> Front \ {u} **<sup>22</sup> foreach** <sup>t</sup> <sup>∈</sup> <sup>T</sup> <sup>∧</sup> <sup>λ</sup>(u) <sup>≥</sup> **Pre**(t) **do** // Add the children of u **<sup>23</sup>** <sup>u</sup> <sup>←</sup> NewNode(); <sup>V</sup> <sup>←</sup> <sup>V</sup> ∪ {u }; Front <sup>←</sup> Front ∪ {u }); <sup>E</sup> <sup>←</sup> <sup>E</sup> ∪ {(u, u )} **<sup>24</sup>** λ(u ) <sup>←</sup> <sup>λ</sup>(u) + **<sup>C</sup>**(t); <sup>δ</sup>((u, u )) <sup>←</sup> <sup>t</sup> **25 end 26 end 27 end <sup>28</sup> return** CT

of the subtree as 'cut' and inserts v again in Front with the marking of v∗. All the markings of the subtree in Front are extracted from it.

All the vertices marked as 'cut' or 'accelerated' are ignored for comparisons and discovering accelerations. This alternative algorithm behaves as the original one except that the size of the tree never decreases and so if the algorithm does not terminate the tree is infinite. Since this tree is finitely branching, due to Koenig Lemma it contains an infinite path. On this infinite path, no vertex can be marked as 'cut' since it would belong to a finite subtree. Observe that the marking labelling the vertex following an accelerated subpath has at least one more ω than the marking of the first vertex of this subpath. So there is an infinite subpath with unmarked vertices in V . But N<sup>P</sup> <sup>ω</sup> is well-ordered, so there should be two vertices v and v- , where v is a descendant of v with λ(v- ) <sup>≥</sup> <sup>λ</sup>(v), which contradicts the behaviour of the algorithm.

Since we are going to use recurrence on the number of iterations of the main loop of Algorithm 1, we introduce the following notations: CT<sup>n</sup> = (Vn, En, λn, δn), Frontn, and Acc<sup>n</sup> are the the values of variables CT, Front, and Acc at line 2 when n iterations have been executed.

**Proposition 9.** *For all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*,* <sup>λ</sup>(V<sup>n</sup> \ Frontn) *is an antichain. Thus on termination,* λ(V ) *is an antichain.*

**Proof.** Let us introduce V - := <sup>V</sup> \ Front and <sup>V</sup> - <sup>n</sup> := <sup>V</sup><sup>n</sup> \ Frontn. We are going to prove by induction on the number n of iterations of the while-loop that V - <sup>n</sup> is an antichain. <sup>M</sup>inCov initializes variables <sup>V</sup> and Front at line 1. So <sup>V</sup><sup>0</sup> <sup>=</sup> {r} and Front<sup>0</sup> <sup>=</sup> {r}, therefore <sup>V</sup> - <sup>0</sup> <sup>=</sup> <sup>V</sup><sup>0</sup> \ Front<sup>0</sup> <sup>=</sup> <sup>∅</sup> is an antichain.

Assume that V - <sup>n</sup> <sup>=</sup> <sup>V</sup><sup>n</sup> \ Front<sup>n</sup> is an antichain. Modifying <sup>V</sup> - <sup>n</sup> can be done by *adding* or *removing* vertices from V<sup>n</sup> and *removing* vertices from Front<sup>n</sup> while keeping them in Vn. The actions that MinCov may perform in order to modify the sets V and Front are: Delete (lines 7 and 19), Prune (lines 16 and 19), adding vertices to V (line 23), adding vertices to Front (lines 16 and 23), and removing vertices from Front (line 21).

• Both Delete and Prune do not add new vertices to <sup>V</sup> - . Thus the antichain feature is preserved.

• <sup>M</sup>inCov may add vertices to <sup>V</sup> only at line 23 where it simultaneously adds them to Front and therefore does not add new vertices to V - . Thus the antichain feature is preserved.

• Adding vertices to Front may only remove vertices from <sup>V</sup> - <sup>n</sup>. Thus the antichain feature is preserved.

• <sup>M</sup>inCov can only add a vertex to <sup>V</sup> when it removes it from Front while keeping it in V . This is done only at line 21. There the only vertex MinCov may remove (line 21) is the working vertex u. However if (in the iteration) MinCov reaches line 21 then it did not reach line 7 hence, (1) all markings of λ(V - <sup>n</sup>) <sup>⊆</sup> <sup>λ</sup>(Vn) are either smaller or incomparable to λ<sup>n</sup>+1(u). Moreover, MinCov has also reached line 18-20, where (2) it performs Delete on all vertices u- <sup>∈</sup> <sup>V</sup> - <sup>n</sup> <sup>⊆</sup> <sup>V</sup><sup>n</sup> with λn(u- ) < λ<sup>n</sup>+1(u). Let us denote by V -- <sup>n</sup> <sup>⊆</sup> <sup>V</sup> - <sup>n</sup> the set V at the end of line 20. Due to (1) and (2), marking λn+1(u) is incomparable to any marking in λn+1(V -- <sup>n</sup> ). Since V -- <sup>n</sup> <sup>⊆</sup> <sup>V</sup> - <sup>n</sup>, λn+1(V -- <sup>n</sup> ) is an antichain. Combining this fact with the incomparability between λn+1(u) and any marking in λn+1(V -- <sup>n</sup> ), we conclude that the set λn+1(V - <sup>n</sup>+1) = λn+1(V -- <sup>n</sup> ) ∪ {λn+1(u)} is an antichain.

In order to establish consistency, we prove that the labelling of vertices and edges is compatible with the firing rule and that Acc is a set of accelerations.

**Lemma 1.** *For all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*, for all* <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>n</sup> \ {r}*,* <sup>λ</sup>n(prd(u)) <sup>δ</sup>(prd(u),u) −−−−−−−→ <sup>λ</sup>n(u) *and* **m**<sup>0</sup> δ(r,r) −−−→ <sup>λ</sup>n(r)*.*

**Proof.** Let us prove by induction on the number n of iterations of the main loop that for all <sup>v</sup> <sup>∈</sup> <sup>V</sup>n, the assertions of the lemma hold. Initially, <sup>V</sup><sup>0</sup> <sup>=</sup> {r} and <sup>λ</sup>0(r) = **<sup>m</sup>**0. Since **<sup>m</sup>**<sup>0</sup> <sup>ε</sup> −→ **<sup>m</sup>**<sup>0</sup> <sup>=</sup> <sup>λ</sup>0(r) the base case is established.

Assume that the assertions hold for CTn. Observe that MinCov may change the labeling function λ and/or add new vertices in exactly two places: at lines 4-6 and at lines 22-25. Therefore in order to prove the assertion, we show that after each group of lines it still holds.

• After lines 4-6: <sup>M</sup>inCov computes (1) a maximal fireable sequence <sup>σ</sup> <sup>∈</sup> Acc<sup>∗</sup> n from λn(u) (line 4), and updates u's marking to **m**<sup>u</sup> = λn(u) + **C**(σ) (line 5). Since the assertions hold for CTn, (2) if <sup>u</sup> <sup>=</sup> <sup>r</sup>, <sup>λ</sup>n(prd(u)) <sup>δ</sup>(prd(u),u) −−−−−−−→ <sup>λ</sup>n(u) else **m**<sup>0</sup> δ(r,r) −−−→ <sup>λ</sup>n(r). By concatenation, we get <sup>λ</sup>n(prd(u)) <sup>δ</sup>(prd(u),u)<sup>σ</sup> −−−−−−−−→ **<sup>m</sup>**<sup>u</sup> if <sup>u</sup> <sup>=</sup> <sup>r</sup> and otherwise **m**<sup>0</sup> δ(r,r)σ −−−−→ **m**<sup>u</sup> which establishes that the assertions hold after line 6.

• After lines 22-25: The vertices for which <sup>λ</sup> is updated at these lines are the children of <sup>u</sup> that are added to the tree. For every fireable transition <sup>t</sup> <sup>∈</sup> <sup>T</sup> from λ(u), MinCov creates a child v<sup>t</sup> for u (lines 22-23). The marking of any child <sup>v</sup><sup>t</sup> is set to **<sup>m</sup>**<sup>n</sup>+1(v) := **<sup>m</sup>**<sup>n</sup>+1(u) + **<sup>C</sup>**(t) (line 24). Therefore since <sup>λ</sup><sup>n</sup>+1(u) <sup>t</sup> −→ λ<sup>n</sup>+1(vt), the assertions hold.

## **Lemma 2.** *At any execution point of* MinCov*,* Acc *is a set of accelerations.*

**Proof.** At most one acceleration is added per iteration. Let us prove by induction on the number n of iterations of the main loop that Acc<sup>n</sup> is a set of accelerations. Since Acc<sup>0</sup> = ∅, the base case is straightforward.

Assume that Acc<sup>n</sup> is a set of accelerations and consider Acc<sup>n</sup>+1. In an iteration, MinCov may add an ω-transition **a** to Acc. Due to the inductive hypothesis, δ(γ) is a sequence of abstractions where γ is defined at line 9. Consider b, the ω-transition defined by **Pre**(b) = **Pre**(δ(γ)) and **C**(b) = **C**(δ(γ)). Due to Proposition 2, b is an abstraction. Due to Proposition 3, the loop of lines 11-15 transforms b into an acceleration **a**. Due to Proposition 7, after truncation at line 16, **a** is still an acceleration.

**Proposition 10.** <sup>λ</sup>(<sup>V</sup> ) <sup>⊆</sup> Cover(<sup>N</sup> , **<sup>m</sup>**0)*.* **Proof.** Let <sup>v</sup> <sup>∈</sup> <sup>V</sup> . Consider the path <sup>u</sup>0,...,u<sup>k</sup> of CT from the root <sup>r</sup> <sup>=</sup> <sup>u</sup><sup>0</sup> to <sup>u</sup><sup>k</sup> <sup>=</sup> <sup>v</sup>. Let <sup>σ</sup> <sup>∈</sup> (<sup>T</sup> <sup>∪</sup> Acc)<sup>∗</sup> denote <sup>δ</sup>(prd(u0), u0)··· <sup>δ</sup>(prd(uk), uk). Due to Lemma 1, <sup>m</sup><sup>0</sup> <sup>σ</sup> −→ <sup>λ</sup>(v). Due to Lemma 2, <sup>σ</sup> is a sequence of abstractions. Due to Proposition 2, the ω-transition **a** defined by **Pre**(**a**) = **Pre**(σ) and **C**(**a**) = **C**(σ) is an abstraction. Due to Proposition 1, <sup>λ</sup>(v) <sup>⊆</sup> Cover(<sup>N</sup> , **<sup>m</sup>**0).

The following definitions are related to an arbitrary execution point of MinCov and are introduced to establish its completeness.

**Definition 8.** *Let* <sup>σ</sup> <sup>=</sup> <sup>σ</sup>0t1σ<sup>1</sup> ...tkσ<sup>k</sup> *with for all* <sup>i</sup>*,* <sup>t</sup><sup>i</sup> <sup>∈</sup> <sup>T</sup> *and* <sup>σ</sup><sup>i</sup> <sup>∈</sup> Acc∗*. Then the firing sequence* **m** <sup>σ</sup> −→ **<sup>m</sup>***is an* exploring sequence *if:*


**Definition 9.** *Let* **<sup>m</sup>** *be a marking. Then* **<sup>m</sup>** *is* quasi-covered *if:*


In order to prove completeness of the algorithm, we want to prove that at the beginning of every iteration, any **<sup>m</sup>** <sup>∈</sup> Cover(<sup>N</sup> , **<sup>m</sup>**0) is quasi-covered. To establish this assertion, we introduce several lemmas showing that this assertion is preserved by some actions of the algorithm with some prerequisites. More precisely, Lemma 3 corresponds to the deletion of the current vertex, Lemma 4 to the discovery of an acceleration, Lemma 5 to the deletion of a subtree whose marking of the root is smaller than the marking of the current vertex and Lemma 6 to the creation of the children of the current vertex.

**Lemma 3.** *Let* CT*,* Front *and* Acc *be the values of corresponding variables at some execution point of* <sup>M</sup>*inCov and* <sup>u</sup> <sup>∈</sup> <sup>V</sup> *be a leaf in* CT *such that the following items hold:*


*Then all* **<sup>m</sup>** <sup>∈</sup> Cover(<sup>N</sup> , **<sup>m</sup>**0) *are quasi-covered after performing* Delete(u)*.*

**Lemma 4.** *Let* CT*,* Front *and* Acc *be the values of corresponding variables at some execution point of* <sup>M</sup>*inCov. and* <sup>u</sup> <sup>∈</sup> <sup>V</sup> *such that the following items hold:*


*Then all* **<sup>m</sup>** <sup>∈</sup> Cover(<sup>N</sup> , **<sup>m</sup>**0) *are quasi-covered after performing* Prune(u) *and then adding* u *to* Front*.*

**Lemma 5.** *Let* CT*,* Front *and* Acc *be the values of corresponding variables at some execution point of* <sup>M</sup>*inCov,* <sup>u</sup> <sup>∈</sup> Front *and* <sup>u</sup>- <sup>∈</sup> <sup>V</sup> *such that the following items hold:*


*Then after performing* Prune(u- ); Delete(u- )*,*


**Lemma 6.** *Let* CT*,* Front *and* Acc *be the values of corresponding variables at some execution point of* <sup>M</sup>*inCov. and* <sup>u</sup> <sup>∈</sup> Front *such that the following items hold:*


*Then after removing* <sup>u</sup> *from* Front *and for all* <sup>t</sup> <sup>∈</sup> <sup>T</sup> *fireable from* <sup>λ</sup>(u)*, adding a child* v<sup>t</sup> *to* u *in* Front *with marking of* v<sup>t</sup> *defined by* λu(vt) = λ(u) + **C**(t)*, all* **<sup>m</sup>** <sup>∈</sup> Cover(<sup>N</sup> , **<sup>m</sup>**0) *are quasi-covered.*

**Proposition 11.** *At the beginning of every iteration, all* **<sup>m</sup>** <sup>∈</sup> Cover(<sup>N</sup> , **<sup>m</sup>**0) *are quasi-covered.*

**Proof.** Let us prove by induction on the number of iterations that all **m** ∈ Cover(<sup>N</sup> , **<sup>m</sup>**0) are quasi-covered.

Let us consider the base case. <sup>M</sup>inCov initializes <sup>V</sup> and Front to {r} and <sup>λ</sup>(r) to **<sup>m</sup>**0. By definition, for all **<sup>m</sup>** <sup>∈</sup> Cov(<sup>N</sup> , **<sup>m</sup>**0) there exists <sup>σ</sup> <sup>=</sup> <sup>t</sup>1t<sup>2</sup> ···t<sup>k</sup> <sup>∈</sup> <sup>T</sup> <sup>∗</sup> such that **m**<sup>0</sup> σ −→ **m**- <sup>≥</sup> **<sup>m</sup>**. Since <sup>V</sup> \ Front <sup>=</sup> <sup>∅</sup>, this firing sequence is an exploring sequence.

Assume that all **<sup>m</sup>** <sup>∈</sup> Cover(<sup>N</sup> , **<sup>m</sup>**0) are quasi-covered at the beginning of some iteration. Let us examine what may happen during the iteration. In lines 4-6, <sup>M</sup>inCov computes the maximal fireable sequence <sup>σ</sup> <sup>∈</sup> Acc<sup>∗</sup> <sup>n</sup> from λn(u) (line 4) and sets u's marking to **m**<sup>u</sup> := λn(u) + **C**(σ) (line 5). Afterwards, there are three possible cases: (1) either **m**<sup>u</sup> is covered by some marking associated with a vertex out of Front, (2) either an acceleration is found, (3) or MinCov computes the successors of u and removes u from Front.

**Line 7.** MinCov calls Delete(u). So CT<sup>n</sup>+1 is obtained by deleting u. Moreover, λ(u- ) ≥ **m**u. Let us check the hypotheses of Lemma 3. Assertion 1 follows from induction since (1) the only change in the data is the increasing of λ(u) by firing some accelerations and (2) u belongs to Front so cannot cover intermediate markings of exploring sequences. Assertion 2 follows from Proposition 9 since <sup>V</sup> \ Front is unchanged. Assertion 3 follows immediately from lines 4-6. Assertion 4 follows with v = u- . Thus using this lemma the induction is proved in this case.


Lines 21-25 correspond to the operations related to Lemma 6. Thus using this lemma, the induction is proved in this case.

The completeness of MinCov is an immediate consequence of the previous proposition.

**Corollary 1.** *When* <sup>M</sup>*inCov terminates,* Cover(<sup>N</sup> , **<sup>m</sup>**0) <sup>⊆</sup> <sup>λ</sup>(<sup>V</sup> )*.*

**Proof.** By Proposition 11 all **<sup>m</sup>** <sup>∈</sup> Cover(<sup>N</sup> , **<sup>m</sup>**0) are quasi-covered. Since on termination, Front is empty for all **<sup>m</sup>** <sup>∈</sup> Cover(<sup>N</sup> , **<sup>m</sup>**0), there exists <sup>v</sup> <sup>∈</sup> <sup>V</sup> such that **<sup>m</sup>** <sup>≤</sup> <sup>λ</sup>(v).

## **4 Tool and benchmarks**

In order to empirically evaluate our algorithm, we have implemented a prototype tool which computes the clover and solves the coverability problem. This tool is developed in the programming language Python, using the Numpy library. It can be found on GitHub<sup>3</sup>. All benchmarks were performed on a computer equipped by Intel i5-8250U CPU with 4 cores, 16GB of memory and Ubuntu Linux 18.03.

**Minimal coverability set.** We compare MinCov with the tool MP [14], the tool VH [16], and the tool CovProc [10]. We have also implemented the (incomplete) minimal coverability tree algorithm denoted by AF in order to measure the additional memory needed for the (complete) tools. Both MP and VH tools were sent to us by the courtesy of the authors. The tool MP has an implementation

<sup>3</sup> https://github.com/IgorKhm/MinCov

in Python and another in C**++**. For comparison we selected the Python one to avoid biases due to programming language.

We ran two kinds of benchmarks: (1) 123 standard benchmarks from the literature in Table 1, (which were taken from [2]), (2) 100 randomly generated Petri nets also in Table 1, since the benchmarks from the literature do not present all the features that lead to infinite state systems. These random Petri nets have the following properties: (1) 50 <sup>&</sup>lt; <sup>|</sup>P|, <sup>|</sup>T<sup>|</sup> <sup>&</sup>lt; 100, (2) the number of places connected of each transition is bounded by 10, and (3) they are not structurally bounded. The execution time of the tools was limited to 900 seconds.

Table 1 contains a summary of all the instances of the benchmarks. The first column shows the number of instances on which the tool timed out. The time column consists of the total time on instances that did not time out plus 900 seconds for any instance that led to a time out. The #Nodes column consists of the peak number of nodes in instances that did not time out on any of the tools (except CovProc which does not provide this number). For MinCov we take the peak number of nodes plus accelerations. In the benchmarks from the literature


we observed that the instances that timed out from MinCov are included in those of AF and MP. However there were instances the timed out on VH but did not time out on MinCov and vice versa. MinCov is the second fastest tool, and compared to VH it is 1.2 times slower. A possible explanation would be that VH is implemented in C**++**. As could be expected, w.r.t. memory requirements MinCov has the least number of nodes. In the benchmarks from the literature MinCov has approximately 10 times less nodes then MP and 1.6 times less then VH. In the random benchmarks these ratio are significantly higher.

**Coverability.** We compare MinCov to the tool qCover [2] on the set of benchmarks from the literature in Table 2. In [2], qCover is compared to the most competitive tools for coverability and achieves a score of 142 solved instances while the second best tool achieves a score of 122. We split the results into counted the number of instances on which the tools failed (columns T/O) and the total time (columns Time) as in Table 1. safe instances (not coverable) and unsafe ones (coverable). In both categories we

We observed that the tools are complementary, i.e. qCover is faster at proving that an instance is safe and MinCov is faster at proving that an instance is unsafe.


**Table 2.** Benchmarks for the coverability problem (60 unsafe and 115 safe)

Therefore, by splitting the processing time between them we get better results. The third row of Table 2 represents a parallel execution of the tools, where the time for each instance is computed as follows:

Time(MinCov qCover) = 2 min (Time(MinCov), Time(qCover)) .

Combining both tools is 2.5 times faster than qCover and 3.5 times faster than MinCov. This confirms the above statement. We could still get better results by dynamically deciding which ratio of CPU to share between the tools depending on some predicted status of the instance.

## **5 Conclusion**

We have proposed a simple and efficient modification of the incomplete minimal coverability tree algorithm for building the clover of a net. Our algorithm is based on the introduction of the concepts of covering abstractions and accelerations. Compared to the alternative algorithms previously designed, we have theoretically bounded the size of the additional space. Furthermore we have implemented a prototype which is already very competitive.

From a theoretical point of view, we plan to study how abstractions and accelerations, could be defined in the more general context of well structured transition systems. From an experimental point of view, we will follow three directions in order to increase the performance of our tool. First as in [13], we have to select appropriate data structures to minimize the number of comparisons between ω-markings. Then we want to precompute a set of accelerations using linear programming as the correctness of the algorithm is preserved and the efficiency could be significantly improved. Last we want to take advantage of parallelism in a more general way than simultaneously running several tools.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended

## Constructing Infinitary Quotient-Inductive Types

Marcelo P. Fiore , Andrew M. Pitts , and S. C. Steenkamp(-)

> Department of Computer Science and Technology University of Cambridge, Cambridge CB3 0FD, UK s.c.steenkamp@cl.cam.ac.uk

Abstract This paper introduces an expressive class of quotient-inductive types, called QW-types. We show that in dependent type theory with uniqueness of identity proofs, even the infinitary case of QW-types can be encoded using the combination of inductive-inductive definitions involving strictly positive occurrences of Hofmann-style quotient types, and Abel's size types. The latter, which provide a convenient constructive abstraction of what classically would be accomplished with transfinite ordinals, are used to prove termination of the recursive definitions of the elimination and computation properties of our encoding of QW-types. The development is formalized using the Agda theorem prover.

Keywords: dependent type theory · higher inductive types · inductive-inductive definitions · quotient types · sized types · category theory

## 1 Introduction

One of the key features of proof assistants based on dependent type theory such as Agda, Coq and Lean is their support for inductive definitions of families of types. Homotopy Type Theory [29] introduces a potentially very useful extension of the notion of inductive definition, the higher inductive types (HITs). To define an ordinary inductive type one declares how its elements are constructed. To define a HIT one not only declares element constructors, but also declares equality constructors in identity types (possibly iterated ones), specifying how the constructed elements and identities are to be equated. In this paper we work in a dependent type theory satisfying uniqueness of identity proofs (UIP), so that identity types are trivial in dimensions higher than one. Nevertheless, as Altenkirch and Kaposi [5] point out, HITs are still useful in such a one-dimensional setting. They introduce the term quotient inductive type (QIT) for this truncated form of HIT.

Figure 1 gives two examples of QITs, using Agda-style notation for dependent type theory; in particular, Set denotes a universe of types and ≡ denotes the identity type. The first example specifies the element and equality constructors for the type Bag X of finite multisets of elements from a type X. The second example, adapted from [5], specifies the element and equality constructors for the type <sup>ω</sup>Tree <sup>X</sup> of trees whose nodes are labelled with elements of <sup>X</sup> and that have unordered countably infinite branching. Both examples illustrate the nice feature

c The Author(s) 2020 Finite multisets:

```
data Bag(X : Set) : Set where
    [] : Bag X
    _::_ : X → Bag X → Bag X
    swap : (x y : X)(ys : Bag X) → x :: y :: ys ≡ y :: x :: ys
```
Unordered countably branching trees (elements of isIso f witness that f is a bijection): data ωTree(X : Set) : Set where

leaf : ωTree X node : X <sup>→</sup> (**<sup>N</sup>** <sup>→</sup> ωTree X) <sup>→</sup> ωTree X perm : (x : X)(f : **<sup>N</sup>** <sup>→</sup> **<sup>N</sup>**)(\_ : isIso f)(g : **<sup>N</sup>** <sup>→</sup> ωTree X) <sup>→</sup> node x g <sup>≡</sup> node x (g ◦ f)

Figure 1. Two examples of QITs

of QITs that users only have to specify the particular identifications between data needed for their applications. Thus the standard property of equality that it is an equivalence relation respecting the constructors is inherited by construction from the usual properties of identity types, without the need to say so in the declaration of the QIT.

The second example also illustrates a more technical aspect of QITs, that they enable constructive versions of structures that classically use non-constructive choice principles. The first example in Figure 1 only involves element constructors of finite arity ([] is nullary and x :: \_ is unary) and consequently Bag X is isomorphic to the type obtained from the ordinary inductive type of finite lists over X by quotienting by the congruence generated by swap. Of course this assumes, as we do in this paper, that the type theory comes with Hofmann-style quotient types [18, Section 3.2.6.1]. By contrast, the second example in the figure involves an element constructor with countably infinite arity. So if one first forms the ordinary inductive type of ordered countably branching trees (by dropping the equality constructor perm from the declaration) and then quotients by a suitable relation to get the equalities specified by perm, one needs the axiom of countable choice to be able to lift the node element constructor to the quotient; see [5, Section 2.2] for a detailed discussion. The construction of the Cauchy reals as a higher inductive-inductive type [29, Section 11.3] provides a similar, but more complicated example where use of countable choice is avoided. Such examples have led to the folklore that as far as constructive type theories go, infinitary QITs are more expressive than the combination of ordinary inductive (or inductive-recursive, or inductive-inductive) types with quotient types. In this paper we use Abel's sized types [2] to show that, for a wide class of QITs, this view is not justified. Thus we make two main contributions:

First we define a family of QITs called QW-types and give elimination and computation rules for them (Section 2). The usual W-types of Martin-Löf [22] are inductive types giving the algebraic terms over a possibly infinitary signature.

One specifies a QW-type by giving a family of equations between such terms. So such QITs give initial algebras for possibly infinitary algebraic theories. As we indicate in Section 3, they can encode a very wide range of examples of possibly infinitary quotient-inductive types, namely those that do not involve constructors taking previously constructed equalities as arguments (so do not cover the infinitary extension of the very general scheme considered by Dybjer and Moeneclaey [12]). In set theory with the Axiom of Choice (AC), QW-types can be constructed simply as Quotients of the underlying W-type—hence the name.

Secondly, we prove that contrary to expectation, without AC it is still possible to construct QW-types using quotients, but not simply by quotienting a W-type. Instead, the type to be quotiented and the relation by which to quotient are given simultaneously by definitions that refer to each other. Thus our construction (in Section 4) involves inductive-inductive definitions [15]. The elimination and computation functions which witness that the quotiented type correctly represents the required QW-type are defined recursively. In order to prove that our recursive definitions terminate we combine the use of inductive definitions involving strictly positive occurrences of quotient types with sized types (currently, we do not know whether it is possible to avoid sizing in favour of, say, a suitable well-founded termination ordering). Sized types provide a convenient constructive abstraction of what classically would be accomplished with sequences of transfinite ordinal length.

#### The type theory in which we work

To present our results we need a version of Martin-Löf Type Theory with (1) uniqueness of identity proofs, (2) quotient types and hence also function extensionality, (3) inductive-inductive datatypes (with strictly positive occurrences of quotient types) and (4) sized types. Lean 3 provides (1) and (2) out of the box, but also the Axiom of Choice, unfortunately. Neither it, nor Coq provide (3) and (4). Agda provides (1) via unrestricted dependent pattern-matching, (2) via a combination of postulates and the rewriting mechanism of Cockx and Abel [8], (3) via its very liberal mechanism for mutual definitions and (4) thanks to the work of Abel [2]. Therefore we make use of the type theory implemented by Agda (version 2.6.0.1) to give formal proofs of our results. The Agda code can be found at doi: 10.17863/CAM.48187. In this paper we describe the results informally, using Agda-style notation for dependent type theory. In particular we use Set to denote the universe at the lowest level of a countable hierarchy of (Russell-style) universes. We also use Agda's convention that an implicit argument of an operation can be made explicit by enclosing it in {braces}.

Acknowledgement We would like to acknowledge the contribution Ian Orton made to the initial development of the work described here. He and the first author supervised the third author's Master's dissertation Quotient Inductive Types: A Schema, Encoding and Interpretation, in which the notion of QW-type (there called a W<sup>+</sup>-type) was introduced.

## 2 QW-types

We begin by recalling some facts about types of well-founded trees, the W-types of Martin-Löf [22]. We take signatures to be elements of the dependent product

$$\mathbf{Sig} = \sum A : \mathbf{Set}, (A \to \mathbf{Set}) \tag{1}$$

So a signature is given by a pair Σ=(A, B) consisting of a type A : Set and a family of types B : A <sup>→</sup> Set. Each such signature determines a polynomial endofunctor [1, 16] <sup>S</sup>{Σ} : Set <sup>→</sup> Set whose value at <sup>X</sup> : Set is the following dependent product

$$\mathfrak{S}\{\Sigma\}X = \sum a:A,(B\,a \to X) \tag{2}$$

An S-algebra is by definition an element of the dependent product

$$\mathsf{Alg}\{\Sigma\} = \sum X : \mathsf{Set}, (\mathsf{S} \, X \to X) \tag{3}$$

S-algebra morphisms (X, s) <sup>→</sup> (X- , s- ) are given by functions h : X <sup>→</sup> X- together with an element of the type

$$\mathsf{isHom}\,h = (a:A)(b:B\,a \to X) \to s'(a, h \circ b) \equiv h(s(a,b))\tag{4}$$

Then the W-type W{Σ} determined by Σ is the underlying type of an initial S-algebra. More generally, Dybjer [11] shows that the initial algebra of any nonnested, strictly positive endofunctor on Set is given by a W-type; and Abbott, Altenkirch, and Ghani [1] extend this to the case with nested uses of W-types as part of their work on containers. (These proofs take place in extensional type theory [22], but work just as well in the intensional type theory with uniqueness of identity proofs and function extensionality that we are using here.)

More concretely, given a signature Σ = (A, B), if one thinks of elements a : A as names of operation symbols whose (not necessarily finite) arity is given by the type B a : Set, then the elements of <sup>W</sup>{Σ} represent the closed algebraic terms (i.e. well-founded trees) over the signature. From this point of view it is natural to consider not only closed terms solely built up from operations, but also open terms additionally built up with variables drawn from some type X. As well as allowing operators of possibly infinite arity, we also allow terms involving possibly infinitely many variables (the second example in Figure 1 involves such terms). Categorically, the type <sup>T</sup>{Σ}X of such open terms is the free <sup>S</sup>-algebra on X and is another W-type, for the signature obtained from <sup>Σ</sup> by adding the elements of X as nullary operations. Nevertheless, it is convenient to give a direct inductive definition:

$$\begin{aligned} \text{data}: & \mathsf{T} \{ \Sigma : \mathsf{Sig} \} (X : \mathsf{Set}): \mathsf{Set} \text{ where} \\ & \eta: X \to \mathsf{T} \, X \\ & \sigma: \mathsf{S} (\mathsf{T} \, X) \to \mathsf{T} \, X \end{aligned} \tag{5}$$

Given an <sup>S</sup>-algebra (Y,s) : Alg{Σ} and a function f : X <sup>→</sup> Y , the unique morphism of <sup>S</sup>-algebras from the free <sup>S</sup>-algebra (<sup>T</sup> X, σ) on <sup>X</sup> to (Y,s) has underlying function <sup>T</sup> X <sup>→</sup> Y mapping each t : <sup>T</sup> X to the element t <sup>=</sup> f in Y defined<sup>1</sup> by recursion on the structure of t:

$$\begin{array}{lcl}\eta \, x \gg = f & = f \, x \\ \sigma(a, b) \gg = f = s(a, \lambda x \to b \, x \gg = f) \end{array} \tag{6}$$

As the notation suggests, = is the Kleisli lifting operation ("bind") for a monad structure on T; indeed, it is the free monad on the endofunctor S.

The notion of "QW-type" that we introduce in this section is obtained from that of W-type by considering not only the algebraic terms over a given signature, but also equations between terms. To code equations we use a type-theoretic rendering of a categorical notion of equational system introduced by Fiore and Hur, referred to as term equational system [14, Section 2] and as monadic equational system [13, Section 5], here instantiated to free monads on signatures.

Definition 1. A system of equations over a signature Σ : Sig is specified by


Thus a system of equations over Σ is an element of the dependent product

$$\mathsf{Spec}\{\Sigma\} = \sum E : \mathsf{Set}, \sum V : (E \to \mathsf{Set}), \tag{7}$$

$$((e : E) \to \mathsf{T}(V \, e)) \times ((e : E) \to \mathsf{T}(V \, e))$$

An <sup>S</sup>{Σ}-algebra <sup>S</sup> X <sup>→</sup> X satisfies the system of equations ε = (E, V,l, r) : Syseq{Σ} if there is an element of type

$$\mathsf{Sat}\{\varepsilon\}X = (e:E)(\rho:V\,e \to X) \to ((le) \gg = \rho) \equiv ((r\,e) \gg = \rho) \tag{8}$$

The category-theoretic view of QW-types is that they are simply S-algebras that are initial among those satisfying a given system of equations:

Definition 2. <sup>A</sup> QW-type for a signature Σ=(A, B) : Sig and system of equations ε = (E, V,l, r) : Syseq{Σ} is given by a type QW{Σ}{ε} : Set equipped with an S-algebra structure and a proof that it satisfies the equations

$$\mathfrak{q}\mathfrak{w}\mathfrak{intro}:\mathbb{S}(\mathbb{Q}\mathbb{W})\to\mathbb{Q}\mathbb{W}\tag{9}$$

$$\mathsf{qwequ} : \mathsf{Sat}\{\varepsilon\} (\mathsf{QW}) \tag{10}$$

together with functions that witness that it is the initial such algebra:

$$\mathsf{qwrec}: (X:\mathsf{Set})(s:\mathsf{S}\,X\to X) \to \mathsf{Sat}\,X \to \mathsf{QW} \to X \tag{11}$$

$$\mathsf{qwrechom} : (X : \mathsf{Set})(s: \mathsf{S} \, X \to X)(p: \mathsf{Sat} \, X) \to \mathsf{isHom}(\mathsf{qwrec} \, X \, s \, p) \qquad (12)$$

$$\begin{array}{c} \mathsf{qwuniq} : (X : \mathsf{Set})(s: \mathsf{S} \ X \to X)(p: \mathsf{Set} \ X)(f : \mathsf{QW} \to X) \to \\\ \xleftarrow{\text{is}} \quad \mathsf{is} \mathsf{Hom} \ f \to \mathsf{qwrec} \ X \, s \, p \equiv f \\\ \xleftarrow{\text{i} \mathsf{w} \ \mathsf{in} \ \mathsf{C} \ \mathsf{in}} \quad \mathsf{c} \ \mathsf{in} \ \mathsf{x} \ \mathsf{in} \ \mathsf{x} \ \mathsf{in} \ \mathsf{x} \ \mathsf{int} \ \mathsf{x} \ \mathsf{int} \ \mathsf{x} \ \mathsf{int} \end{array} \tag{13}$$

<sup>1</sup> Note that the definition of <sup>=</sup> depends on the <sup>S</sup>-algebra structure s; in Agda we use instance arguments to hide this dependence.

Given the definitions of <sup>S</sup>{Σ} in (2) and Sat{ε} in (8), properties (9) and (10) suggest that a QW-type is an instance of the notion of quotient-inductive type [5] with element constructor qwintro and equality constructor qwequ. For this to be so, QW{Σ}{ε} needs to have the requisite dependently-typed elimination and computation<sup>2</sup> properties for these element and equality constructors. As Proposition 1 below shows, these follow from (11)–(13), because we are working in a type theory with function extensionality (by virtue of assuming quotient types). To state the proposition we need a dependent version of (6). For each

$$\begin{array}{l} P: \mathsf{QW} \to \mathsf{Set} \\ p: (a:A)(b:B\, a \to \mathsf{QW}) \to ((x:B\, a) \to P(b\, x)) \to P(\mathsf{qwintro}(a,b)) \end{array} \tag{14}$$

type X : Set, function f : X <sup>→</sup> x : QW,P x and term t : <sup>T</sup>(X), we get an element lift P pf t : P(t <sup>=</sup> fst ◦ f) defined by recursion on the structure of t:

$$\begin{array}{lll}\text{lift } P \, p \, f \, (\eta \, x) &= \text{snd}(f \, x) \\ \text{lift } P \, p \, f \, (\sigma(a, b)) &= p \, a \, (\lambda x \to b \, x \gg= (\text{fst} \, o \, f)) \, (\text{lift } P \, p \, f \circ b) \end{array} \tag{15}$$

Proposition 1. For a QW-type as in the above definition, given <sup>P</sup> and <sup>p</sup> as in (14) and a term of type

$$(e:E)(f:V\ e \to \sum x:\mathbb{Q}\mathbb{W}, P\, x) \to \text{lift}\, P\, p\, f\, (l\,\,e) \equiv \underset{}{\equiv}{\text{lift}}\, P\, p\, f\, (r\,\,e) \tag{16}$$

there are elimination and computation terms:

qwelim : (x : QW) <sup>→</sup> P x qwcomp : (a : A)(b : B a <sup>→</sup> QW) <sup>→</sup> qwelim(qwintro(a, b)) <sup>≡</sup> pab (qwelim ◦ b)

(Note that (16) uses McBride's heterogeneous equality type [23], which we denote by ≡≡, because lift P pf (l e) and lift P pf (r e) inhabit different types, namely P(l e <sup>=</sup> fst ◦ f) and P(r e <sup>=</sup> fst ◦ f) respectively.)

The proof of the proposition can be found in the accompanying Agda code (doi: 10.17863/CAM.48187).

So QW-types are in particular quotient-inductive types (QITs). Conversely, in the next section we show that a wide range of QITs can be encoded as QW-types. Then in Section 4 we prove:

Theorem 1. In constructive dependent type theory with uniqueness of identity proofs (or equivalently the Axiom K of Streicher [27]) and universes with inductive-inductive datatypes [15] permitting strictly positive occurrences of quotient types [18] and sized types [2], for every signature and system of equations (Definition 1) there is a QW-type as in Definition 2.

<sup>2</sup> We only establish the computation property up to propositional rather than definitional equality; so, using the terminology of Shulman [25], these are typal quotient-inductive types.

Remark 1 (Free algebras). Definition 2 defines QW-types as initial algebras. A corollary of Theorem 1 is that free-algebras also exist. In other words, given a signature <sup>Σ</sup> and a type X : Set, there is an <sup>S</sup>-algebra

$$(\mathsf{F}\{\Sigma\}\{\varepsilon\}X, \mathsf{S}\{\Sigma\}(\mathsf{F}\{\Sigma\}\{\varepsilon\}X) \to \mathsf{F}\{\Sigma\}\{\varepsilon\}X)$$

satisfying a system of equations ε and equipped with a function X <sup>→</sup> <sup>F</sup>{Σ}{ε}X, and which is universal among such <sup>S</sup>-algebras. Thus QW{Σ}{ε} is isomorphic to <sup>F</sup>{Σ}{ε}∅, where <sup>∅</sup> is the empty datatype.

To see that such free algebras can be constructed as QW-types, given a signature Σ=(A, B), let <sup>Σ</sup><sup>X</sup> be the signature (<sup>X</sup> A, B- ), where X A is the coproduct datatype (with constructors inl : X <sup>→</sup> X A and inr : A <sup>→</sup> X A) and where <sup>B</sup>- : X A <sup>→</sup> Set maps each inl x to <sup>∅</sup> and each inr a to B a. Given a system of equations <sup>ε</sup> = (E, V,l, r), let <sup>ε</sup><sup>X</sup> be the system (E,V,l<sup>X</sup>, r<sup>X</sup>) where for each <sup>e</sup> : <sup>E</sup>, <sup>l</sup><sup>X</sup> <sup>e</sup> <sup>=</sup> l e <sup>=</sup> <sup>η</sup> and <sup>r</sup><sup>X</sup> <sup>e</sup> <sup>=</sup> r e <sup>=</sup> <sup>η</sup> (using <sup>η</sup> : V e <sup>→</sup> <sup>T</sup>{ΣX}(V e) as in (5) and the <sup>S</sup>{Σ}-algebra structure s on <sup>T</sup>{ΣX}(V e) given by s(a, b) = σ(inr a, b)). Then one can show that the QW-type QW{ΣX}{ε<sup>X</sup>} is the free algebra <sup>F</sup>{Σ}{ε}X, with the function <sup>X</sup> <sup>→</sup> <sup>F</sup>{Σ}{ε}<sup>X</sup> sending each <sup>x</sup> : <sup>X</sup> to qwintro(inl x, \_) : QW{ΣX}{ε<sup>X</sup>}, and the <sup>S</sup>{Σ}-algebra structure on <sup>F</sup>{Σ}{ε}X being given by the function sending (a, b) : <sup>S</sup>(QW{ΣX}{ε<sup>X</sup>}) to qwintro(inr a, b).

Remark 2 (Strictly positive equational systems). A very general, categorical notion of equational system was introduced by Fiore and Hur [14, Section 3]. They regard any endofunctor S : Set <sup>→</sup> Set as a functorial signature. A functorial term over such a signature, <sup>S</sup> -G L, is specified by another functorial signature G : Set <sup>→</sup> Set (the term's context) together with a functor L from S-algebras to G-algebras that commutes with the forgetful functors to Set. Then an equational system is given by a pair of such terms in the same context, S - G L and S - G R say. An S-algebra s : S X <sup>→</sup> X satisfies the equational system if L(X, s) and R(X, s) are equal G-algebras.

Taking the strictly positive endofunctors Set → Set to be the smallest collection containing the identity and constant endofunctors and closed under forming dependent products and dependent functions over fixed types then, as in [11] (and also in the type theory in which we work), up to isomorphism every such endofunctor is of the form S{Σ} for some signature Σ : Sig. If we restrict attention to equational systems S - G L, R with S and G strictly positive, then it turns out that such equational systems are in bijection with the systems of equations from Definition 1, and the two notions of satisfaction for an algebra coincide in that case. (See our Agda development for a proof of this.) So Dybjer's characterisation of W-types as initial algebras for strictly positive endofunctors generalises to the fact that QW-types are initial among the algebras satisfying strictly positive equational systems in the sense of Fiore and Hur.

## 3 Quotient-inductive types

Higher inductive types (HITs) are originally motivated by their use in homotopy type theory to construct homotopical cell complexes, such as spheres, tori, and so on [29]. Intuitively, a higher inductive type is an inductive type with point constructors also allowing for path constructors, surface constructors, etc., which are represented as elements of (iterated) identity types. For example, the sphere is given by the HIT<sup>3</sup>:

$$\begin{aligned} \text{data } \mathbb{S}^2 &: \mathsf{Set} \text{ where} \\ \mathsf{base} &: \mathbb{S}^2 \\ \mathsf{surf} &: \mathsf{refl} \equiv\_{\mathsf{base} \equiv\_{\mathsf{sp}} \mathsf{base}} \mathsf{refl} \end{aligned} \tag{17}$$

In the presence of the UIP axiom we will refer to HITs as quotient inductive types (QITs) [5], since all paths beyond the first level are trivial and any HIT is truncated to an h-set. We use the terms element constructor and equality constructor to refer to the point constructors and the only non-trivial level of path constructors.

We believe that QW-types can be used to encode a wide range of QITs: see Conjecture 1 below. As evidence, we give several examples of QITs encoded as QW-types, beginning with the two examples of QITs in Figure 1, giving the corresponding signature (A, B) and system of equations (E, V,l, r) as in Definition 2.

Example 1 (Finite multisets). The element constructors for finite multisets are encoded exactly as with a W-type: the constructors are [] and x :: \_ for each x : X. So we take A to be **<sup>1</sup>** X, the coproduct of the unit type **<sup>1</sup>** (whose single constructor is denoted tt) with X. The arity of [] is zero, and the arity of each x ::\_ is one, represented by the empty type <sup>∅</sup> and unit type **<sup>1</sup>** respectively; so we take B : A <sup>→</sup> Set to be the function [λ\_<sup>→</sup> **<sup>0</sup>** <sup>|</sup> λ\_<sup>→</sup> **<sup>1</sup>**] : **<sup>1</sup>** X <sup>→</sup> Set mapping inl tt to <sup>∅</sup> and each inr x to **<sup>1</sup>**.

The swap equality constructor is parameterised by elements of E <sup>=</sup> X <sup>×</sup> X. For each (x, y) : E, swap x y yields an equation involving a single free variable (called ys : Bag <sup>X</sup> in Figure 1); so we take <sup>V</sup> : <sup>E</sup> <sup>→</sup> Set to be <sup>λ</sup>\_<sup>→</sup> **<sup>1</sup>**. Each side of the equation named by swap x y is coded by an element of <sup>T</sup>{Σ}(V (x, y)) = <sup>T</sup>{Σ}(**1**). Recalling the definition of <sup>T</sup> from (5), the single free variable corresponds to <sup>η</sup> tt : <sup>T</sup>{Σ}(**1**) and then the left-hand side of the equation is <sup>σ</sup>(inr x,(λ\_<sup>→</sup> <sup>σ</sup>(inr y,(λ\_<sup>→</sup> <sup>η</sup> tt)))) and the right-hand side is σ(inr y,(λ\_<sup>→</sup> σ(inr x,(λ\_<sup>→</sup> η tt)))).

So, altogether, the signature and system of equations for the QW-type corresponding to the first example in Figure 1 is:

$$\begin{aligned} A &= \mathbb{1} \uplus X & E &= X \times X \\ B &= \left[ \lambda \\_ \to \mathcal{Q} \mid \lambda \\_ \to \mathbb{1} \right] & V &= \lambda \\_ \to \mathbb{1} \\ l &= \lambda \left( x, y \right) \to \sigma \left( \operatorname{im} x, \left( \lambda \\_ \to \sigma \left( \operatorname{im} y, \left( \lambda \\_ \to \eta \operatorname{tt} \right) \right) \right) \right) \\ r &= \lambda \left( x, y \right) \to \sigma \left( \operatorname{im} y, \left( \lambda \\_ \to \sigma \left( \operatorname{im} x, \left( \lambda \\_ \to \eta \operatorname{tt} \right) \right) \right) \right) \end{aligned}$$

<sup>3</sup> The subscript on <sup>≡</sup> will be treated as an implicit argument and omitted when clear.

Example 2 (Unordered countably-branching trees). Here the element constructors are leaf of arity zero and, for each <sup>x</sup> : <sup>X</sup>, node <sup>x</sup> of arity **<sup>N</sup>**. So we use the signature with A <sup>=</sup> **<sup>1</sup>** X and B = [λ\_<sup>→</sup> <sup>∅</sup> <sup>|</sup> λ\_<sup>→</sup> **<sup>N</sup>**].

The perm equality constructor is parameterised by elements of

$$E = X \times \sum f : (\mathbb{N} \to \mathbb{N}), \text{ islso} \, f^\*$$

For each element (x, f, i) of that type, perm xf i yields an equation involving an **<sup>N</sup>**-indexed family of variables (called <sup>g</sup> : **<sup>N</sup>** <sup>→</sup> <sup>ω</sup>Tree <sup>X</sup> in Figure 1); so we take V : E <sup>→</sup> Set to be λ\_<sup>→</sup> **<sup>N</sup>**. Each side of the equation named by perm xf i is coded by an element of <sup>T</sup>{Σ}(V (x, f, i)) = <sup>T</sup>{Σ}(**N**). The **<sup>N</sup>**-indexed family of variables is represented by the function η : **<sup>N</sup>** <sup>→</sup> <sup>T</sup>{Σ}(**N**) and its permuted version by η ◦ f. Thus the left- and right-hand sides of the equation named by perm xf i are coded respectively by the elements σ(inr x, η) and σ(inr x, η ◦ f) of T{Σ}(**N**).

So, altogether, the signature and system of equations for the QW-type corresponding to the second example in Figure 1 is:

$$\begin{aligned} A &= \mathbb{1} \uplus X & E &= X \times \sum f : (\mathbb{N} \to \mathbb{N}), \text{ islso } f \\ B &= [\lambda\_- \to \mathcal{Q} \mid \lambda\_- \to \mathbb{N}] & V &= \lambda\_- \to \mathbb{N} \\ l &= \lambda \left( x, \\_, \\_) \to \sigma (\text{inr } x, \eta \right) \\ r &= \lambda \left( x, f, \\_) \to \sigma (\text{inr } x, \eta \circ f) \end{aligned}$$

That unordered countably-branching trees are a QW-type is significant since no previous work on various subclasses of QITs (or indeed QIITs [19, 10]) supports infinitary QITs [6, 26, 28, 12, 19, 10]. See Example 5 for another, more substantial infinitary QW-type. So this extension represents one of our main contributions. QW-types generalise prior developments; the internal encodings for particular subclasses of 1-HITs given by Sojakova [26] and Swan [28] are direct instances of QW-types, as the next two examples show.

Example 3. W-suspensions [26] are an instance of QW-types. The data for a W-suspension is: <sup>A</sup>- , C- : Set, a type family B- : A- → Set and functions l - , r- : C- <sup>→</sup> A- . The equivalent QW-type is:

$$\begin{aligned} A &= A' & E &= C'\\ B &= B' & V &= \lambda \, c \to (B'(l'c)) \times (B'(r'c)) \end{aligned} \qquad \begin{aligned} l &= \lambda \, c \to \sigma((l'c), \eta) \\ r &= \lambda \, c \to \sigma((r'c), \eta) \end{aligned}$$

Example 4. The non-indexed case of W-types with reductions [28] are QW-types. The data of such a type is: Y : Set, X : Y <sup>→</sup> Set and a reindexing map R : (y : Y ) <sup>→</sup> Xy. The reindexing map identifies a term σ (y, α) with some α (R y) used to construct it. The equivalent QW-type is given by:

$$\begin{array}{ccccc} A = Y & \ & \ & E = Y & \ & \ & l = \lambda y \rightarrow \sigma \ (y, \eta) \\ \mathbf{n} & \mathbf{v} & \ & \mathbf{v} & \ & \ & \mathbf{v} \end{array}$$

B <sup>=</sup> X V <sup>=</sup> X r <sup>=</sup> λy <sup>→</sup> η (R i)

Example 5. Lumsdaine and Shulman [21, Section 9] give an example of a HIT not constructible in type theory from only pushouts and **<sup>N</sup>**. Their HIT <sup>F</sup> can be thought of as a set of notations for countable ordinals. It consists of three point constructors: 0 : F, S : F <sup>→</sup> F, and sup : (**<sup>N</sup>** <sup>→</sup> F) <sup>→</sup> F, and five path constructors which are omitted here for brevity. It is inspired by the infinitary algebraic theory of Blass [7, Section 9] and hence it is not surprising that it can be encoded by a QW-type; the details can be found in our Agda code.

#### 3.1 General QIT schemas

Basold, Geuvers, and van der Weide [6] present a schema (though not a model) for infinitary QITs that do not support conditional path equations. Constructors are defined by arbitrary polynomial endofunctors built up using (non-dependent) products and sums, which means in particular that parameters and arguments can occur in any order. They require constructors to be in uncurried form.

Dybjer and Moeneclaey [12, Sections 3.1 and 3.2] present a schema for finitary QITs that supports conditional path equations, where constructors are allowed to take inductive arguments not just of the datatype being declared, but also of its identity type. This schema can be generalised to infinitary QITs with conditional path equations. We believe this extension of their schema to be the most general schema for QITs. The schema requires all parameters to appear before all arguments, whereas the schema for regular inductive types in Agda is more flexible, allowing parameters and arguments in any order.

We wish to combine the schema for infinitary QITs of Basold, Geuvers, and van der Weide [6] with the schema for QITs with conditional path equations of Dybjer and Moeneclaey [12] to provide a general schema. Moreover, we would like to combine the arbitrarily ordered parameters and arguments of the former with the curried constructors of the latter in order to support flexible pattern matching.

For consistency with the definition of inductive types in Agda [9, equation (25) and figure 1] we will define strictly positive (i.e. polynomial) endofunctors in terms of strictly positive telescopes.

A telescope is given by the grammar:

$$\begin{array}{ll} \Delta ::= \epsilon & \text{empty telescope} \\ \mid \quad (x:A)\Delta & (x \notin \text{dom}(\Delta)) \text{ non-empty telescope} \end{array} \tag{18}$$

A telescope extension (x : A)Δ binds (free) occurrences of x inside the tail <sup>Δ</sup>. The type A may contain free variables that are later bound by further telescope extensions on the left. A telescope can also exist in a context which binds any free variables not already bound in the telescope. Such a context is implicit in the following definitions. A function type <sup>Δ</sup> <sup>→</sup> C from a telescope <sup>Δ</sup> to a type C is defined as an iterated dependent function type by:

$$\begin{aligned} \epsilon \to C \stackrel{\text{def}}{=} C\\ (x:A)\Delta \to C \stackrel{\text{def}}{=} (x:A) \to (\Delta \to C) \end{aligned} \tag{19}$$

<sup>A</sup> strictly positive endofunctor on a variable Y is presented by a strictly positive telescope

$$\Delta = (x\_1 : \Phi\_1(Y))(x\_2 : \Phi\_2(Y)) \cdots (x\_n : \Phi\_n(Y))\epsilon \tag{20}$$

where each type scheme <sup>Φ</sup><sup>i</sup> is described by a expression on <sup>Y</sup> made up of <sup>Π</sup>-types, <sup>Σ</sup>-types, and any (previously defined "constant") types A not containing Y , according to the grammar:

$$\Phi(Y), \Psi(Y) ::= \quad (y:A) \to \Phi(Y) \quad | \quad \Sigma \, p: \Phi(Y), \Psi(Y) \quad | \quad A \quad | \quad Y \quad \tag{21}$$

For example, Δ def <sup>=</sup> (x : X)(f : **<sup>N</sup>** <sup>→</sup> Y ) is the strictly positive telescope for the node constructor in Figure 1. In this instance, reordering x and f is permitted by exchange. Note that the variable Y can never appear in the argument position of a Π-type.

Now it is possible to define the form of the endpoints of an equality (within the context of a strictly positive telescope), corresponding to the notion of an abstract syntax tree with free variables. With this intuition in mind, we can take the definition in Dybjer and Moeneclaey's presentation [12] of endpoints given by point constructor patterns:

$$l, r, p \coloneqq \quad c\_i \ k \quad \mid \ y \tag{22}$$

Where y : Y is in the context of the telescope for the equality constructor, and k is a term built without any rule for Y , but which may use other point constructor patterns p : Y . (That is, any sub-term of type Y must either be a variable y : Y found in the telescope, or a constructor for Y applied to further point constructor patterns and earlier defined constants. It could not, for instance, use the function application rule for Y with some function g : M <sup>→</sup> Y , not least since such functions cannot be defined before defining Y .) Note that this exactly matches the type T in (5).

Basold, Geuvers, and van der Weide's presentation has a sightly more general notion of constructor term [6, Definition 6] (Dybjer and Moeneclaey's presentation [12] has more restricted telescopes). It is defined by rules which operate in the context of a strictly positive (polynomial) telescope and permit use of its bound variables, and the use of constructors c<sup>i</sup>, but not any other rules for Y . We take the dependent form of their rules for products and functions. Note that these rules do not allow the use of terms of type ≡<sup>Y</sup> in the endpoints.

As with inductive types, the element constructors of QITs are specified by strictly positive telescopes. The equality constructors also permit conditions to appear in strictly positive positions, where <sup>l</sup> and <sup>r</sup> are constructor terms according to grammar (22):

$$\Phi(Y), \Psi(Y) \coloneqq \left(same \text{ } gamma \text{ } as \text{ } in \text{ } (21)\right) \mid l \equiv\_Y r \tag{23}$$

Definition 3. A QIT is defined by a list of named element constructors and equality constructors:

$$\begin{aligned} \text{data Y}: & \text{Set where} \\ \mathsf{c}\_1: & \Delta\_1 \to \mathsf{Y} \\ & \vdots \\ & \mathsf{c}\_n: \Delta\_n \to \mathsf{Y} \\ & \mathsf{p}\_1: \Theta\_1 \to l\_1 \equiv \mathsf{y} \ r\_1 \\ & \vdots \\ & \mathsf{p}\_m: \Theta\_m \to l\_m \equiv \mathsf{y} \ r\_m \end{aligned}$$

where Δ<sup>i</sup> are strictly positive telescopes on Y according to (21), and Θ<sup>j</sup> are strictly positive telescopes on Y and ≡<sup>Y</sup> in which conditions may also occur in strictly positive positions according to (23).

QITs without equality constructors are inductive types. If none of the equality constructors contain Y in an argument position then it is called non-recursive, otherwise it is called recursive [6]. If none of the equality constructors contain an equality in <sup>Y</sup> then we call it a non-conditional, or equational, QIT, otherwise it is called a conditional [12], or quasi-equational, QIT. If all of the constant types <sup>A</sup> in any of the constructors are finite (isomorphic to Fin n for n : **<sup>N</sup>**) then it is called a finitary QIT [12]. Otherwise, it is called a generalised [12], or infinitary, QIT. We are not aware of any existing examples in the literature of HITs which allow the point constructors to be conditional (though it is not difficult to imagine), nor any schemes for HITs that allow such definitions. However, we do believe this is worth investigating further.

Conjecture 1. Any equational QIT can be encoded as a QW-type.

We believe this can be proved analogously to the approach of Dybjer [11] for inductive types, though the endpoints still need to be considered and we have not yet translated the schema in definition 3 into Agda.

Remark 3. Assuming Conjecture 1, Basold, Geuvers, and van der Weide's schema [6], being an equational (non-conditional) instance of Definition 3, can be encoded as a QW-type.

## 4 Construction of QW-types

In Section 2 we defined a QW-type to be initial among algebras over a given (possibly infinitary) signature satisfying a given systems of equations (Definition 2). If one interprets these notions in classical Zermelo-Fraenkel set theory with the axiom of Choice (ZFC), one regains the usual notion from universal algebra of initial algebras for infinitary equational theories. Since in the set-theoretic interpretation there is an upper bound on the cardinality of arities of operators in a given signature Σ, the ordinal-indexed sequence S<sup>α</sup>(∅) of iterations of the functor in (2) starting from the empty set eventually becomes stationary; and

so the sequence has a small colimit, namely the set W{Σ} of well-founded trees over <sup>Σ</sup>. A system of equations ε (Definition 1) over <sup>Σ</sup> generates a <sup>Σ</sup>-congruence relation <sup>∼</sup> on <sup>W</sup>{Σ}. The quotient set <sup>W</sup>{Σ}/<sup>∼</sup> yields the desired initial algebra for (Σ, ε) provided the <sup>S</sup>-algebra structure on <sup>W</sup>{Σ} induces one on the quotient set. It does so, because for each operator, using AC one can pick representatives of the (possibly infinitely many) equivalence classes that are the arguments of the operator, apply the interpretation of the operator in W{Σ} and then take the equivalence class of that. So the set-theoretic model of type theory in ZFC models QW-types.

Is this use of choice really necessary? Blass [7, Section 9] shows that if one drops AC and just works in ZF, then provided a certain large cardinal axiom is consistent with ZFC, it is consistent with ZF that there is an infinitary equational theory with no initial algebra. He shows this by first exhibiting a countably presented equational theory whose initial algebra has to be an uncountable regular cardinal; and secondly appealing to the construction of Gitik [17] of a model of ZF with no uncountable regular cardinals (assuming a certain large cardinal axiom). Lumsdaine and Shulman [21] turn the infinitary equational theory of Blass into a higher-inductive type that cannot be proved to exist in ZF (and hence cannot be constructed in type theory just using pushouts and the natural numbers). We noted in Example 5 that this higher inductive type can be presented as a QW-type.

So one cannot hope to construct QW-types using a type theory which is interpretable in just ZF. However, the type theory in which we work, with its universes closed under inductive-inductive definitions, already requires going beyond ZF to be able to give it a naive, classical set-theoretic interpretation (by assuming the existence of enough strongly inaccessible cardinals, for example). So the above considerations about initial algebras for infinitary equational theories in classical set theory do not rule out the construction of QW-types in the type theory in which we work. However, something more than just quotienting a W-type is needed in order to prove Theorem 1.

Figure 2 gives a first attempt to do this (which later we will modify using sized types to get around a termination problem). The definition is relative to a given signature Σ : Sig and system of equations ε = (E, V,l, r) : Syseq <sup>Σ</sup>. It makes use of quotient types, which we add to Agda via postulates, as shown in Figure 3. 4 The REWRITE pragma makes elim RBfe (mk R x) definitionally equal to f x and is not merely a computational convenience—this is what allows function extensionality to be proved from these postulated quotient types. The POLARITY pragmas enable the postulated quotients to be used in datatype declarations at positions that Adga deems to be strictly positive; a case in point being the definitions of Q<sup>0</sup> and Q<sup>1</sup> in Figure 2. Agda's test for strict positivity is sound with respect to a set-theoretic semantics of inductively defined datatypes that are built up using strictly positive uses of dependent functions; the semantics of such datatypes uses initial algebras for endofunctors possessing a rank. Here we

<sup>4</sup> The actual implementation is polymorphic in universe levels, but for simplicity here we just give the level-zero version.

```
mutual
  data Q0 : Set where
    sq : T Q → Q0
  data Q1 : Q0 → Q0 → Set where
    sqeq : (e : E)(ρ : V e → Q) → Q1 (sq(T'ρ (l e))) (sq(T'ρ (r e)))
    sqη : (x : Q0) → Q1 (sq(η(qu x))) x
    sqσ : (s : S(T Q)) → Q1 (sq(σ s)) (sq(ι(S'(qu ◦ sq) s)))
  Q : Set
  Q = Q0/Q1
  qu : Q0 → Q
  qu = quot.mk Q1
QW{Σ}{ε} = Q
```
Figure 2. First attempt at constructing QW-types

are allowing the inductively defined datatypes to be built up using quotients as well, but this is semantically unproblematic, since quotienting does not increase rank. (Later we need to combine the use of POLARITY with sized types; the semantics of this has been studied for System F<sup>ω</sup> [3], but needs to be explored further for Agda.)

We build up the underlying inductive type Q<sup>0</sup> to be quotiented using a constructor sq that takes well-founded trees <sup>T</sup>(Q0/Q1) of whole equivalence classes with respect to a relation Q<sup>1</sup> that is mutually inductively defined with Q0—an instance of an inductive-inductive definition [15]. The definition of Q<sup>1</sup> makes use of the actions on functions of the signature endofunctor S and its associated free monad T (Section 2); those actions are defined as follows:

$$\begin{array}{l} \mathsf{S}^{\mathfrak{l}} : \{ X \, Y : \mathsf{Set} \} \to (X \to Y) \to \mathsf{S} \, X \to \mathsf{S} \, Y\\ \mathsf{S}^{\mathfrak{l}} \, f \, (a, b) = (a, f \circ b) \\ \ldots & \ldots & \ldots & \ldots & \ldots \end{array} \tag{24}$$

$$\begin{array}{l} \mathsf{T}^{\mathsf{t}} : \{ X \, Y : \mathsf{Set} \} \to (X \to Y) \to \mathsf{T} \, X \to \mathsf{T} \, Y\\ \mathsf{T}^{\mathsf{t}} f \, t = t \gg = (\eta \circ f) \end{array} \tag{25}$$

The definition of <sup>Q</sup><sup>1</sup> also uses the natural transformation <sup>ι</sup> : {<sup>X</sup> : Set} → <sup>S</sup> <sup>X</sup> <sup>→</sup> <sup>T</sup> X defined by ι <sup>=</sup> σ ◦ <sup>S</sup>' η.

Turning to the proof of Theorem 1 using the definitions in Figure 2, the S-algebra structure (9) is easy to define without using any form of choice, because of the type of Q0's constructor sq. Indeed, we can just take qwintro to be qu◦sq◦ι : <sup>S</sup>(QW) <sup>→</sup> QW. <sup>5</sup> The first constructor sqeq of the data type Q<sup>1</sup> ensures that the quotient <sup>Q</sup><sup>0</sup>/Q<sup>1</sup> satisfies the equations in <sup>ε</sup>, so that we get qwequ as in (10); and the other two constructors, sq<sup>η</sup> and sq<sup>σ</sup> make identifications that

<sup>5</sup> The use of the free monad <sup>T</sup>{Σ} in the domain of sq, rather than just <sup>S</sup>{Σ}, seems necessary in order to define Q<sup>1</sup> with the properties needed for (10)–(13).

```
module quot where
  postulate
    ty : {A : Set}(R : A → A → Set) → Set
    mk : {A : Set}(R : A → A → Set) → A → ty R
    eq : {A : Set}(R : A → A → Set){x y : A} → Rxy → mk R x ≡ mk R y
    elim : {A : Set}(R : A → A → Set)(B : ty R → Set)(f : (x : A) → B(mk R x))
          (e : {x y : A} → R x y → f x ≡≡ f y)(z : ty R) → B z
    comp : {A : Set}(R : A → A → Set)(B : ty R → Set)(f : (x : A) → B(mk R x))
           (e : {x y : A} → R x y → f x ≡≡ f y)(x : A) → elim R B f e (mk R x) ≡ f x
{-# REWRITE comp -#}
{-# POLARITY ty ++ ++ -#}
{-# POLARITY mk _ _ * -#}
_/_ : (A : Set)(R : A → A → Set) → Set
A/R = quot.ty R
```
Figure 3. Quotient types

enable the construction of functions qwrec, qwrechom and qwuniq as in (11)–(13). However, there is a problem. Given X : Set, s : <sup>S</sup> X <sup>→</sup> X and e : Sat X, for qwrec Xse we have to construct a function <sup>r</sup> : <sup>Q</sup> <sup>→</sup> <sup>X</sup>. Since <sup>Q</sup> <sup>=</sup> <sup>Q</sup>0/Q<sup>1</sup> is a quotient, we will have to use the eliminator quot.elim from Figure 3 to define r. The following is an obvious candidate definition

$$\begin{array}{l} \text{(26)}\\ \begin{array}{l} \text{r} : \text{Q} \rightarrow X\\ \text{r} = \text{quot.elim} \text{ Q}\_{1} \left( \lambda\_{-} \rightarrow X \right) \mathfrak{r}\_{0} \mathfrak{r}\_{1} \\\\ \mathfrak{r}\_{0} : \text{Q}\_{0} \rightarrow X \\ \mathfrak{r}\_{0}(\mathsf{sq} \, t) = t \ \vee = \mathsf{r} \\\\ \mathfrak{r}\_{1} : \{x \, y : \mathsf{Q}\_{0} \} \rightarrow \mathsf{Q}\_{1} \, x \, y \rightarrow \mathfrak{r}\_{0} \, x \equiv \mathsf{r}\_{0} \, y \\\\ \mathfrak{r}\_{1} = \cdots \end{array}$$

(where we have elided the details of the invariance proof r1). The problem with this mutually recursive definition is that it is not clear to us (and certainly not to Agda) whether it gives totally defined functions: although the value of r<sup>0</sup> at a typical element sq t is explained in terms of the structurally smaller element t, the explanation involves r, whose definition uses the whole function r<sup>0</sup> rather than some application of it at a structurally smaller argument. Agda's termination checker rejects the definition.

We get around this problem by using a type-based termination method, namely Agda's implementation of sized types [2]. Intuitively, this provides a type Size of "sizes" which give a constructive abstraction of features of ordinals in ZF when they are used to index sequences of sets that eventually become stationary, such as in various transfinite constructions of free algebras [20, 14]. In Agda, the type Size comes equipped with various relations and functions: given sizes

```
mutual
  data Q0(i : Size) : Set where
    sq : {j : Size< i} → T(Q j) → Q0 i
  data Q1(i : Size) : Q0 i → Q0 i → Set where
    sqeq : {j : Size< i}(e : E)(ρ : V e → Q j) → Q1 i (sq(T'ρ (l e))) (sq(T'ρ (r e)))
    sqη : {j : Size< i}(x : Q0 j) → Q1 i (sq(η(qu j x))) (φ0 i x)
    sqσ : {j : Size< i}{k : Size< j}(s : S(T(Q k))) →
                     Q1 i (sq(σ s)) (sq(ι(S'(qu j ◦ sq) s)))
  Q : Size → Set
  Q i = (Q0 i)/Q1 i
  qu : (i : Size) → Q0 i → Q i
  qu i = quot.mk (Q1 i)
  φ0 : (i : Size){j : Size< i} → Q0 j → Q0 i
  φ0 i(sq z) = sq z
QW{Σ}{ε} = Q∞
```
Figure 4. Construction of QW-types using sized types

i, j : Size, there is a relation i : Size< j to indicate strictly increasing size (so the type Size< j is treated as a subtype of Size); there is a successor operation <sup>↑</sup> : Size <sup>→</sup> Size (and also a join operation \_<sup>s</sup>\_ : Size <sup>→</sup> Size <sup>→</sup> Size, but we do not need it here); and a size ∞ : Size to indicate where a sequence becomes stationary. Thus we construct the QW-type QW{Σ}{ε} as <sup>Q</sup> <sup>∞</sup> for a suitable size-indexed sequence of types Q : Size → Set, shown in Figure 4.

For each size <sup>i</sup> : Size, the type <sup>Q</sup> <sup>i</sup> is a quotient <sup>Q</sup><sup>0</sup> i/Q<sup>1</sup> <sup>i</sup>, where the constructors of the data types <sup>Q</sup><sup>0</sup> <sup>i</sup> and <sup>Q</sup><sup>1</sup> <sup>i</sup> take arguments of smaller sizes <sup>j</sup> : Size< <sup>i</sup>. Consequently in the following sized version of (26)

$$\begin{array}{l} \text{mutual} \\ \mathbf{r} : \{i : \texttt{Size} \} \to \mathbf{Q} \, i \to X \\ \mathbf{r} \{i \} = \texttt{quot.elim} \left( \mathbf{Q}\_{1} \, i \right) \left( \lambda\_{\!=} \rightarrow X \right) \left( \mathbf{r}\_{0} \left\{ i \right\} \right) \left( \mathbf{r}\_{1} \left\{ i \right\} \right) \\ \mathbf{r}\_{0} : \{i : \texttt{Size} \} \to \mathbf{Q}\_{0} \, i \to X \\ \mathbf{r}\_{0} \{i \} (\texttt{sq} \left\{ j \right\} \, t) = t \gg= \mathbf{r} \, \{ j \} \\ \mathbf{r}\_{1} : \{ i : \texttt{Size} \} \{ x \, y : \mathbf{Q}\_{0} \, i \} \to \mathbf{Q}\_{1} \, i \, x \, y \to \mathbf{r}\_{0} \, x \equiv \mathbf{r}\_{0} \, y \\ \mathbf{r}\_{1} = \cdots \end{array} \\ \mathbf{r}\_{0} = \texttt{mutual} $$

the definition of <sup>r</sup>0{i} involves a recursive call via <sup>r</sup> to the whole function <sup>r</sup>0, but at a size j which is smaller than i. So now Agda accepts that the definition of qwrec Xse as <sup>r</sup>∞, with <sup>r</sup> as in (27), is terminating.

Thus we get a function qwrec for (11). We still have (9), but now with qwintro <sup>=</sup> qu∞◦ sq {∞} ◦ <sup>ι</sup>; and as before, the constructor sqeq of <sup>Q</sup><sup>1</sup> in Figure 4 ensures that QW = (Q<sup>0</sup> <sup>∞</sup>)/Q<sup>1</sup> <sup>∞</sup> satisfies the equations <sup>ε</sup>. With these definitions it turns out that each qwrec Xse is an <sup>S</sup>-algebra morphism up to definitional

equality, so that the function qwrechom needed for (12) is straightforward to define. Finally, the function qwuniq needed for (13) can be constructed via a sequence of lemmas making use of the other two constructors of the data type <sup>Q</sup>1, namely sqη, which makes use of an auxiliary function for coercing between different size instances of <sup>Q</sup>0, and sqσ. We refer the reader to the accompanying Agda code (doi: 10.17863/CAM.48187) for the details of the construction of qwuniq. Altogether, the sized definitions in Figure 4 allow us to complete a proof of Theorem 1.

## 5 Conclusion

QW-types are a general form of QIT that capture many examples, including simple 1-cell complexes and non-recursive QITs [6], non-structural QITs [26], W-types with reductions [28], and also infinitary QITs (e.g. unordered infinitely branching trees [5], and ordinals [21]). They also capture the notion of initial (and free) algebras for strictly positive equational systems [14], analogously to how W-types capture the notion of initial (and free) algebras for strictly positive endofunctors (see Remark 2). Using Agda to formalise our results, we have shown that it is possible to construct any QW-type, even infinitary ones, in intensional type theory satisfying UIP, using inductive-inductive definitions permitting strictly positive occurrences of quotients and sized types (see Theorem 1 and Section 4). We conclude by mentioning related work and some possible directions for future work.

Quotients of monads. In view of Remark 2, Section 4 gives a construction of initial algebras for equational systems [14] on the free monad T{Σ} generated by a signature Σ. By a suitable change of signature (see Remark 1) this extends to a construction of free algebras, rather than just initial ones. We can show that the construction works for an arbitrary strictly positive monad and not just for free ones. Given such a construction one gets a quotient monad morphism from the base monad to the quotient monad. This contravariantly induces a forgetful functor from the algebras of the latter to that of the former. Using the adjoint triangle theorem, one should be able to construct a left adjoint. This would then cover examples such as the free group over a monoid, free ring over a group, etc.

Quotient inductive-inductive types. The notion of QW-type generalises to indexed QW-types, analogously to the generalisation of W-types to Petersson-Synek trees for inductively defined indexed families of types [24, Chapter 16], and we will consider it in subsequent work. More generally, we wonder whether our analysis of QITs using quotients, inductive-inductive and sized types can be extended to cover the notion of quotient inductive-inductive type (QIIT) [4, 19]. Dijkstra [10] studies such types in depth and in Chapter 6 of his thesis gives a construction for finitary ones in terms of countable colimits, and hence in terms of countable coproducts and quotients. One could hope to pass to the infinitary case by using sized types as we have done, provided an analogue for QIITs can be found of the monadic construction in Section 4 for our class of QITs, the QW-types. Kaposi, Kovács, and Altenkirch [19] give a specification of finitary QIITs using a domain-specific type theory called the theory of signatures and prove existence of QIITs matching this specification. It might be possible to encode their theory of signatures using QW-types (it can already be encoded as a QIIT), or to extend QW-types making this possible. This would allow infinitary QIITs.

Schemas for QITs. We have shown by example that QW-types can encode a wide range of QITs. However, we have yet to extend this to a proof of Conjecture 1 that every instance of the schema for QITs considered in Section 3 can be so encoded.

Conditional path equations. In Section 3 we mentioned the fact that Dybjer and Moeneclaey [12] give a model for finitary 1-HITs and 2-HITs in which constructors are allowed to take arguments involving the identity type of the datatype being declared. On the face of it, QW-types are not able to encode such conditional QITs. We plan to consider whether it is possible to extend the notion of QW-type to allow encoding of infinitary QITs with such conditional equations.

Homotopy Type Theory (HoTT). Our development makes use of UIP (and heterogeneous equality), which is well-known to be incompatible with the Univalence Axiom [29, Example 3.1.9]. Given the interest in HoTT, it is certainly worth investigating whether a result like Theorem 1 holds in univalent foundations for a suitably coherent version of QW-types. We are currently investigating this using set-truncation.

Pattern matching for QITs and HITs. Our reduction of QITs to inductioninduction, strictly positive quotients and sized types is of theoretical interest, but in practice one could wish for more direct support in systems like Agda, Lean and Coq for the very useful notion of quotient inductive types (or more generally, for higher inductive types). Even having better support for the special case of quotient types would be welcome. It is not hard to envisage the addition of a general schema for declaring QITs; but when it comes to defining functions on them, having to do that with eliminator forms rapidly becomes cumbersome (for example, for functions of several QIT arguments). Some extension of dependently typed pattern matching to cover equality constructors as well as element constructors is needed and the third author has begun work on that based on the approach of Cockx and Abel [9].<sup>6</sup>

<sup>6</sup> In this context it is worth mentioning that the cubical features of recent versions of Agda give access to cubical type theory [30]. This allows for easy declaration of HITs and hence in particular QITs (and quotients avoiding the need for POLARITY pragmas) and a certain amount of pattern matching when it comes to defining functions on them: the value of a function on a path constructor can be specified by using generic elements of the interval type in point-level patterns; but currently the user is given little mechanised assistance to solve the definitional equality constraints on end-points of paths that are generated by this method.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Relative full completeness for bicategorical cartesian closed structure**

Marcelo Fiore<sup>1</sup> and Philip Saville(-)2

<sup>1</sup> Department of Computer Science and Technology, University of Cambridge, UK marcelo.fiore@cl.cam.ac.uk <sup>2</sup> School of Informatics, University of Edinburgh, UK philip.saville@ed.ac.uk

**Abstract.** The glueing construction, defined as a certain comma category, is an important tool for reasoning about type theories, logics, and programming languages. Here we extend the construction to accommodate '2-dimensional theories' of types, terms between types, and rewrites between terms. Taking bicategories as the semantic framework for such systems, we define the glueing bicategory and establish a bicategorical version of the well-known construction of cartesian closed structure on a glueing category. As an application, we show that free finite-product bicategories are fully complete relative to free cartesian closed bicategories, thereby establishing that the higher-order equational theory of rewriting in the simply-typed lambda calculus is a conservative extension of the algebraic equational theory of rewriting in the fragment with finite products only.

**Keywords:** glueing, bicategories, cartesian closure, relative full completeness, rewriting, type theory, conservative extension

## **1 Introduction**

*Relative full completeness for cartesian closed structure.* Every small category **C** can be viewed as an algebraic theory. This has sorts the objects of **C** with unary operators for each morphism of **C** and equations determined by the equalities in **C**. Suppose one freely extends **C** with finite products. Categorically, one obtains the free cartesian category **F**×[**C**] on **C**. From the well-known construction of **F**×[**C**] (see e.g. [12] and [46, §8]) it is direct that the universal functor **C** → **F**×[**C**] is fully-faithful, a property we will refer to as the relative full completeness (c.f. [2,16]) of **C** in **F**×[**C**]. Type theoretically, **F**×[**C**] corresponds to the Simply-Typed Product Calculus (STPC) over the algebraic theory of **C**, given by taking the fragment of the Simply-Typed Lambda Calculus (STLC) consisting of just the types, rules, and equational theory for products. Relative full completeness corresponds to the STPC being a conservative extension.

Consider now the free cartesian closed category **F**<sup>×</sup>,<sup>→</sup>[**C**] on **C**, type-theoretically corresponding to the STLC over the algebraic theory of **C**. Does the relative full completeness property, and hence conservativity, still hold for either **C** in **F**<sup>×</sup>,<sup>→</sup>[**C**]

or for **F**×[**C**] in **F**×,→[**C**]? Precisely, is either the universal functor **C** → **F**×,→[**C**] or its universal cartesian extension **F**×[**C**] → **F**×,→[**C**] full and faithful? The answer is affirmative, but the proof is non-trivial. One must either reason prooftheoretically (e.g. in the style of [63, Chapter 8]) or employ semantic techniques such as glueing [39, Annexe C].

In this paper we consider the question of relative full completeness in the bicategorical setting. This corresponds to the question of conservativity for 2-dimensional theories of types, terms between types, and rewrites between terms (see [32,20]). We focus on the particular case of the STLC with invertible rewrites given by β-reductions and η-expansions, and its STPC fragment. By identifying these two systems with cartesian closed, resp. finite product, structure 'up to isomorphism' one recovers a conservative extension result for rewrites akin to that for terms.

*2-dimensional categories and rewriting.* It has been known since the 1980s that one may consider 2-dimensional categories as abstract reduction systems (e.g. [54,51]): if sorts are 0-cells (objects) and terms are 1-cells (morphisms), then rewrites between terms ought to be 2-cells. Indeed, every sesquicategory (of which 2-categories are a special class) generates a rewriting relation on its 1-cells defined by f g if and only if there exists a 2-cell f <sup>⇒</sup> g (e.g. [60,58]). Invertible 2-cells may be then thought of as equality witnesses.

The rewriting rules of the STLC arise naturally in this framework: Seely [56] observed that β-reduction and η-expansion may be respectively interpreted as the counit and unit of the adjunctions corresponding to lax (directed) products and exponentials in a 2-category (c.f. also [34,27]). This approach was taken up by Hilken [32], who developed a '2-dimensional λ-calculus' with strict products and lax exponentials to study the proof theory of rewriting in the STLC (c.f. also [33]).

Our concern here is with equational theories of rewriting, and we follow Seely in viewing weak categorical structure as a semantic model of rewriting modulo an equational theory. We are not aware of non-syntactic examples of 2-dimensional cartesian closed structure that are lax but not pseudo (i.e. up to isomorphism) and so adopt cartesian closed bicategories as our semantic framework.

From the perspective of rewriting, a sesquicategory embodies the rewriting of terms modulo the monoid laws for identities and composition, while a bicategory embodies the rewriting of terms modulo the equational theory on rewrites given by the triangle and pentagon laws of a monoidal category. Cartesian closed bicategories further embody the usual β-reductions and η-expansions of STLC modulo an equational theory on rewrites; for instance, this identifies the composite rewrite t<sup>1</sup>, t<sup>2</sup>⇒π<sup>1</sup>(t<sup>1</sup>, t<sup>2</sup>), π<sup>2</sup>(t<sup>1</sup>, t<sup>2</sup>)⇒t<sup>1</sup>, t<sup>2</sup> with the identity rewrite. Indeed, in the free cartesian closed bicategory over a signature of base types and constant terms, the quotient of 1-cells by the isomorphism relation provided by 2-cells is in bijection with αβη-equivalence classes of STLC-terms (c.f. [55, Chapter 5]).

*Bicategorical relative full completeness.* The bicategorical notion of relative full completeness arises by generalising from functors that are fully-faithful to

pseudofunctors F : B→C that are locally an equivalence, that is, for which every hom-functor <sup>F</sup>X,Y : <sup>B</sup>(X, Y ) → C(FX, F Y ) is an equivalence of categories. Interpreted in the context of rewriting, this amounts to the conservativity of rewriting theories. First, the equational theory of rewriting in C is conservative over that in B: the hom-functors do not identify distinct rewrites. Second, the reduction relation in <sup>C</sup>(FX, F Y ) is conservative over that in <sup>B</sup>(X, Y ): whenever F f - F g in <sup>C</sup> then already f g in <sup>B</sup>. Third, the term structure in <sup>B</sup> gets copied by <sup>F</sup> in <sup>C</sup>: modulo the equational theory of rewrites, there are no new terms between types in the image of F.

*Contributions.* This paper makes two main contributions.

Our first contribution, in Section 3, is to introduce the bicategorical glueing construction and, in Section 4, to initiate the development of its theory. As well as providing an assurance that our notion is the right one, this establishes the basic framework for applications. Importantly, we bicategorify the fundamental folklore result (e.g. [40,12,62]) establishing mild conditions under which a glued bicategory is cartesian closed.

Our second contribution, in Section 5, is to employ bicategorical glueing to show that for a bicategory B with finite-product completion F<sup>×</sup>[B] and cartesianclosed completion F<sup>×</sup>,<sup>→</sup>[B], the universal pseudofunctor B→F<sup>×</sup>,<sup>→</sup>[B] and its universal finite-product-preserving extension F<sup>×</sup>[B] → F<sup>×</sup>,<sup>→</sup>[B] are both locally an equivalence. Since one may directly observe that the universal pseudofunctor B→F<sup>×</sup>[B] is locally an equivalence, we obtain relative full completeness results for bicategorical cartesian closed structure mirroring those of the categorical setting. Establishing this proof-theoretically would require the development of a 2-dimensional proof theory. Given the complexities already present at the categorical level this seems a serious and interesting undertaking. Here, once the basic bicategorical theory has been established, the proof is relatively compact. This highlights the effectiveness of our approach for the application.

The result may also be expressed type-theoretically. For instance, in terms of the type theories of [20], the type theory Λ<sup>×</sup>,→ps for cartesian closed bicategories is a conservative extension of the type theory Λ× ps for finite-product bicategories. It follows that, modulo the equational theory of bicategorical products and exponentials, any rewrite between STPC-terms constructed using the βη-rewrites for both products and exponentials may be equally presented as constructed from just the βη-rewrites for products (see [21,55]).

*Further work.* We view the foundational theory presented here as the starting point for future work. For instance, we plan to incorporate further type structure into the development, such as coproducts (c.f. [22,16,4]) and monoidal structure (c.f. [31]).

On the other hand, the importance of glueing in the categorical setting suggests that its bicategorical counterpart will find a range of applications. A case in point, which has already been developed, is the proof of a 2-dimensional normalisation property for the type theory Λ×,→ps for cartesian closed bicategories of [20] that entails a corresponding bicategorical coherence theorem [21,55]. There

are also a variety of syntactic constructions in programming languages and type theory that naturally come with a 2-dimensional semantics (see e.g. the use of 2-categorical constructions in [23,14,6,61,35]). In such scenarios, bicategorical glueing may prove useful for establishing properties corresponding to the notions of adequacy and/or canonicity, or for proving further conservativity properties.

## **2 Cartesian closed bicategories**

We begin by briefly recapitulating the basic theory of bicategories, including the definition of cartesian closure. A summary of the key definitions is in [41]; for a more extensive introduction see e.g. [5,7].

## **2.1 Bicategories**

Bicategories axiomatise structures in which the associativity and unit laws of composition only hold up to coherent isomorphism, for instance when composition is defined by a universal property. They are rife in mathematics and theoretical computer science, appearing in the semantics of computation [29,11,49], datatype models [1,13], categorical logic [26], and categorical algebra [19,25,18].

**Definition 1 ([5]).** A bicategory B consists of


$$\begin{aligned} \mathbf{a}\_{h,g,f} &: (h \circ g) \circ f \Rightarrow h \circ (g \circ f) : W \to Z\\ \mathbf{l}\_f &: \mathrm{Id}\_X \circ f \Rightarrow f : W \to X\\ \mathbf{r}\_g &: g \circ \mathrm{Id}\_X \Rightarrow g : X \to Y \end{aligned}$$

for every f : W <sup>→</sup> X, g : X <sup>→</sup> Y and h : Y <sup>→</sup> Z, natural in each of their parameters and satisfying a triangle law and a pentagon law analogous to those for monoidal categories.


A bicategory has three notions of 'opposite', depending on whether one reverses 1-cells, 2-cells, or both (see e.g. [37, §1.6]). We shall only require the following.

**Definition 2.** The opposite of a bicategory <sup>B</sup>, denoted <sup>B</sup>op, is obtained by setting <sup>B</sup>op(X, Y ) := <sup>B</sup>(Y,X) for all X, Y ∈ B.

A morphism of bicategories is called a pseudofunctor (or homomorphism) [5]. It is a mapping on objects, 1-cells and 2-cells that preserves horizontal composition up to isomorphism. Vertical composition is preserved strictly.

**Definition 3.** <sup>A</sup> pseudofunctor (F, φ, ψ) : B→C between bicategories <sup>B</sup> and <sup>C</sup> consists of


subject to two unit laws and an associativity law. A pseudofunctor for which φ and ψ are both the identity is called strict. A pseudofunctor is called locally P if every functor <sup>F</sup>X,Y satisfies the property <sup>P</sup>.

Example 2. A monoidal category is equivalently a one-object bicategory; a monoidal functor is equivalently a pseudofunctor between one-object bicategories.

Pseudofunctors F, G : B→C are related by pseudonatural transformations. A pseudonatural transformation (k, <sup>k</sup>) : F <sup>⇒</sup> G consists of a family of 1-cells (kX : F X <sup>→</sup> GX)X∈B and, for every <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> , an invertible 2-cell <sup>k</sup>f : <sup>k</sup>Y ◦ F f <sup>⇒</sup> Gf ◦ <sup>k</sup>X witnessing naturality. The 2-cells <sup>k</sup>f are required to be natural in <sup>f</sup> and satisfy two coherence axioms. A morphism of pseudonatural transformations is called a modification, and may be thought of as a coherent family of 2-cells.

Notation 1. For bicategories <sup>B</sup> and <sup>C</sup> we write **Bicat**(B, <sup>C</sup>) for the (possibly large) bicategory of pseudofunctors, pseudonatural transformations, and modifications (see e.g. [41]). If <sup>C</sup> is a 2-category, then so is **Bicat**(B, <sup>C</sup>). We write **Cat** for the 2-category of small categories and think of the 2-category **Bicat**(Bop, **Cat**) as a bicategorical version of the presheaf category Set**<sup>C</sup>**op . As for presheaf categories, one must take care to avoid size issues. We therefore adopt the convention that when considering **Bicat**(Bop, **Cat**) the bicategory <sup>B</sup> is small or locally small as appropriate.

Example 3. For every bicategory <sup>B</sup> and X ∈ B there exists the representable pseudofunctor <sup>Y</sup>X : <sup>B</sup>op <sup>→</sup> **Cat**, defined by YX := <sup>B</sup>(−, X). The 2-cells φ and ψ are structural isomorphisms.

The notion of equivalence between bicategories is called biequivalence. A biequivalence B C consists of a pair of pseudofunctors F : <sup>B</sup> G : <sup>C</sup> together with equivalences F G id<sup>C</sup> and GF id<sup>B</sup> in **Bicat**(C, <sup>C</sup>) and **Bicat**(B, <sup>B</sup>) respectively. Equivalences in an arbitrary bicategory are defined by analogy with equivalences of categories, see e.g. [42, pp. 28].

Remark 1. The coherence theorem for monoidal categories [44, Chapter VII] generalises to bicategories: any bicategory is biequivalent to a 2-category [45] (see [42] for a readable summary of the argument). We are therefore justified in writing simply <sup>∼</sup><sup>=</sup> for composites of **<sup>a</sup>**, **<sup>l</sup>** and **<sup>r</sup>**.

As a rule of thumb, a category-theoretic proposition lifts to a bicategorical proposition so long as one takes care to weaken isomorphisms to equivalences and sprinkle the prefixes 'pseudo' and 'bi' in appropriate places. For instance, bicategorical adjoints are called biadjoints and bicategorical limits are called bilimits [59]. The latter may be thought of as limits in which every cone is filled by a coherent choice of invertible 2-cell. Bilimits are preserved by representable pseudofunctors and by right biadjoints. The bicategorical Yoneda lemma [59, §1.9] says that for any pseudofunctor <sup>P</sup> : <sup>B</sup>op <sup>→</sup> **Cat**, evaluation at the identity determines a pseudonatural family of equivalences **Bicat**(Bop, **Cat**)(YX, P) P X. One may then deduce that the Yoneda pseudofunctor Y : B → **Bicat**(Bop, **Cat**) : <sup>X</sup> <sup>→</sup> <sup>Y</sup><sup>X</sup> is locally an equivalence. Another 'bicategorified' lemma is the following, which we shall employ in Section 5.

**Lemma 1.** 1. For pseudofunctors F, G : B→C, if F G and G is locally an equivalence, then so is F.

2. For pseudofunctors F : A→B, G : B→C, H : C→D, if G ◦ F and H ◦ G are local equivalences, then so is F.

## **2.2 fp-Bicategories**

It is convenient to directly consider all finite products, as this reduces the need to deal with the equivalent objects given by re-bracketing binary products. To avoid confusion with the 'cartesian bicategories' of Carboni and Walters [10,8], we call a bicategory with all finite products an fp-bicategory.

**Definition 4.** An fp-bicategory (B, <sup>Π</sup>n(−)) is a bicategory <sup>B</sup> equipped with the following data for every <sup>A</sup><sup>1</sup>,...,An ∈ B (<sup>n</sup> <sup>∈</sup> **<sup>N</sup>**):


$$\mathcal{B}(X, \prod\_{n} (A\_1, \dots, A\_n)) \downarrow \simeq \underbrace{\prod\_{i=1}^n \mathcal{B}(X, A\_i)}\_{\langle -, \dots, - \rangle} \tag{1}$$

specified by choosing a family of universal arrows (see e.g. [44, Theorem IV.2]) with components (i) <sup>f</sup>1,...,f<sup>n</sup> : <sup>π</sup><sup>i</sup> ◦ f<sup>1</sup>,...,fn ⇒ <sup>f</sup><sup>i</sup> for <sup>i</sup> = 1,...,n.

We call the right adjoint −,..., <sup>=</sup> the n-ary tupling.

Explicitly, the universal property of = ( (1),..., (n)) is the following. For any finite family of 2-cells (αi : <sup>π</sup>i ◦ <sup>g</sup> <sup>⇒</sup> <sup>f</sup>i : <sup>X</sup> <sup>→</sup> <sup>A</sup>i)i=1,...,n, there exists a 2-cell p† (α1,...,αn) : <sup>g</sup> ⇒ f1,...,fn : <sup>X</sup> <sup>→</sup> n(A1,...,An), unique such that

$$
\varpi\_{f\_1,\dots,f\_n}^{(k)} \bullet \left(\pi\_k \circ \mathfrak{p}^\dagger(\alpha\_1,\dots,\alpha\_n)\right) = \alpha\_k : \pi\_k \circ g \Rightarrow f\_k.
$$

for k = 1,...,n. One thereby obtains a functor −,..., <sup>=</sup> and an adjunction as in (1) with counit = ( (1),..., (n)) and unit <sup>ς</sup>g := <sup>p</sup>† (idπ1◦g,..., idπn◦g) : <sup>g</sup> ⇒ π<sup>1</sup> ◦ g,..., πn ◦ <sup>g</sup>. This defines a lax <sup>n</sup>-ary product structure: one merely obtains an adjunction in (1). One turns it into a bicategorical (pseudo) product by further requiring the unit and counit to be invertible. The terminal object **1** arises as <sup>0</sup>(). We adopt the same notation as for categorical products, for example by writing <sup>n</sup> i=1A<sup>i</sup> for n(A1,...,An) and <sup>n</sup> i=1f<sup>i</sup> for f<sup>1</sup> ◦ <sup>π</sup>1,...,f<sup>n</sup> ◦ <sup>π</sup>n.

Example 4. The bicategory of spans over a lextensive category [9] has finite products; such a bicategory is biequivalent to its opposite, so these are in fact biproducts [38, Theorem 6.2]. Biproduct structure arises using the coproduct structure of the underlying category (c.f. the biproduct structure of the category of relations).

Remark 2 ( c.f. Remark 1). fp-Bicategories satisfy the following coherence theorem: every fp-bicategory is biequivalent to a 2-category with 2-categorical products [52, Theorem 4.1]. Thus, we shall sometimes simply write ∼= in diagrams for composites of 2-cells arising from either the bicategorical or product structure. In pasting diagrams we shall omit such 2-cells completely (c.f. [30, Remark 3.1.16]; for a detailed exposition, see [64, Appendix A]).

One may think of bicategorical product structure as an intensional version of the familiar categorical structure, except the usual equations (e.g. [28]) are now witnessed by natural families of invertible 2-cells. It is useful to introduce explicit names for these 2-cells.

Notation 2. In the following, and throughout, we write <sup>A</sup>• for a finite sequence A<sup>1</sup>,...,An.

**Lemma 2.** For any fp-bicategory (B, <sup>Π</sup>n(−)) there exist canonical choices for the following natural families of invertible 2-cells:


In particular, it follows from Lemma 2(2) that there exists a canonical natural family of invertible 2-cells <sup>Φ</sup>h•,g• : ( n i=1hi) ◦ ( n i=1gi) <sup>⇒</sup> <sup>n</sup> i=1(h<sup>i</sup> ◦ <sup>g</sup>i) for any (hi : <sup>A</sup>i <sup>→</sup> <sup>B</sup>i)i=1,...,n and (gj : <sup>X</sup>j <sup>→</sup> <sup>A</sup>j )j=1,...,n.

In the categorical setting, a cartesian functor preserves products up to isomorphism. An fp-pseudofunctor preserves bicategorical products up to equivalence.

**Definition 5.** An fp-pseudofunctor (F, <sup>q</sup>×) between fp-bicategories (B, <sup>Π</sup>n(−)) and (C, <sup>Π</sup>n(−)) is a pseudofunctor <sup>F</sup> : B→C equipped with specified equivalences

$$\langle F\pi\_1, \dots, F\pi\_n \rangle : F(\prod\_{i=1}^n A\_i) \stackrel{\iff}{\Longrightarrow} \prod\_{i=1}^n (FA\_i) : \mathbf{q}\_{A\_\bullet}^\times$$

for every <sup>A</sup>1,...,An ∈ B (<sup>n</sup> <sup>∈</sup> **<sup>N</sup>**). We denote the 2-cells witnessing these equivalences by u<sup>×</sup> <sup>A</sup>• : Id( - <sup>i</sup> F Ai) ⇒ F π1,...,Fπn ◦ <sup>q</sup><sup>×</sup> <sup>A</sup>• and <sup>c</sup><sup>×</sup> <sup>A</sup>• : q<sup>×</sup> <sup>A</sup>• ◦ F π1,...,Fπn ⇒ Id(FΠiAi). We call (F, <sup>q</sup>×) strict if <sup>F</sup> is strict and satisfies

$$\begin{aligned} F(\prod\_n (A\_1, \ldots, A\_n)) &= \prod\_n (FA\_1, \ldots, FA\_n) \\ F(\pi\_i^{A\_1, \ldots, A\_n}) &= \pi\_i^{FA\_1, \ldots, FA\_n} \\ F\left< t\_1, \ldots, t\_n \right> &= \left< Ft\_1, \ldots, Ft\_n \right> & \mathbf{q}\_{A\_1, \ldots, A\_n}^\times = \mathrm{Id}\_{\Pi\_n(FA\_1, \ldots, FA\_n)} \end{aligned}$$

with equivalences given by the 2-cells p† (**r**π<sup>1</sup> ,..., **<sup>r</sup>**π<sup>n</sup> ) : Id <sup>∼</sup><sup>=</sup> <sup>=</sup>⇒ π1,...,πn.

Notation 3. For fp-bicategories <sup>B</sup> and <sup>C</sup> we write **fp**-**Bicat**(B, <sup>C</sup>) for the bicategory of fp-pseudofunctors, pseudonatural transformations and modifications.<sup>3</sup>

We define two further families of 2-cells to witness standard properties of cartesian functors. The first witnesses the fact that any fp-pseudofunctor commutes with the n(−,..., =) operation. The second witnesses the equality F π1,...,Fπn ◦ <sup>F</sup>f1,...,fn <sup>=</sup> F f1,...,Ffn 'unpacking' an <sup>n</sup>-ary tupling from inside F.

**Lemma 3.** Let (F, <sup>q</sup>×):(B, <sup>Π</sup>n(−)) <sup>→</sup> (C, <sup>Π</sup>n(−)) be an fp-pseudofunctor.

1. For any finite family of 1-cells (fi : <sup>A</sup>i <sup>→</sup> <sup>A</sup> i)i=1,...,n in <sup>B</sup>, there exists an invertible 2-cell natf• : <sup>q</sup><sup>×</sup> A- • ◦ n i=1F f<sup>i</sup> <sup>⇒</sup> <sup>F</sup>( n i=1fi) ◦ <sup>q</sup><sup>×</sup> <sup>A</sup>• such that the pair (q<sup>×</sup>, nat) forms a a pseudonatural transformation

$$\prod\_{i=1}^{n} \left( F(-), \dots, F(=) \right) \Rightarrow \left( F \circ \prod\_{i=1}^{n} \right) (-, \dots, , =)$$

2. For any finite family of 1-cells (fi : <sup>X</sup> <sup>→</sup> <sup>B</sup>i)i=1,...,n in <sup>B</sup>, there exists a canonical choice of naturally invertible 2-cell unpack<sup>f</sup>• : F π<sup>1</sup>,...,Fπn ◦ <sup>F</sup>f<sup>1</sup>,...,fn⇒F f<sup>1</sup>,...,Ffn : F X <sup>→</sup> <sup>n</sup> i=1 F Bi.

#### **2.3 Cartesian closed bicategories**

A cartesian closed bicategory is an fp-bicategory (B, <sup>Π</sup>n(−)) equipped with a biadjunction (−)×A  (A <sup>=</sup>- <sup>−</sup>) for every A ∈ B. Examples include the bicategory of generalised species [17], bicategories of concurrent games [49], and bicategories of operads [26].

<sup>3</sup> In the categorical setting, every natural transformation between cartesian functors is monoidal with respect to the cartesian structure and a similar fact is true bicategorically: every pseudonatural transformation is canonically compatible with the product structure, see [55, § 4.1.1].

**Definition 6.** A cartesian closed bicategory or cc-bicategory is an fp-bicategory (B, <sup>Π</sup>n(−)) equipped with the following data for every A, B ∈ B:


$$\mathcal{B}(X, A \Rightarrow B) \xleftrightarrow{\hspace{1cm}} \xleftrightarrow{\hspace{1cm}} \mathcal{B}(X \times A, B),$$

specified by a choice of universal arrow <sup>ε</sup>f : evalA,B ◦ (λf <sup>×</sup> <sup>A</sup>) <sup>∼</sup><sup>=</sup> <sup>=</sup><sup>⇒</sup> f.

We call the functor λ(−) currying and refer to λf as the currying of f.

Explicitly, the counit ε satisfies the following universal property. For every 1-cell g : X <sup>→</sup> (A <sup>=</sup>- <sup>B</sup>) and 2-cell <sup>α</sup> : evalA,B ◦ (<sup>g</sup> <sup>×</sup>A) <sup>⇒</sup> <sup>f</sup> there exists a unique 2-cell e† (α) : <sup>g</sup> <sup>⇒</sup> λf such that <sup>ε</sup>f • - evalA,B ◦ (e† (α) <sup>×</sup>A) <sup>=</sup> α. This defines a lax exponential structure. One obtains a pseudo (bicategorical) exponential structure by further requiring that <sup>ε</sup> and the unit <sup>η</sup>t := <sup>e</sup>† (idevalA,B◦(t×A)) are invertible.

Example 5. Every 'presheaf' 2-category **Bicat**(Bop, **Cat**) has all bicategorical limits [52, Proposition 3.6], given pointwise, and is cartesian closed with (P <sup>=</sup>- Q)X := **Bicat**(Bop, **Cat**)(YX <sup>×</sup> P,Q) [55, Chapter 6].

As for products, we adopt the notational conventions that are standard in the categorical setting, for example by writing (f <sup>=</sup>- g):(A <sup>=</sup>- B) <sup>→</sup> (A <sup>=</sup>- B ) for the currying of (<sup>g</sup> ◦ evalA,B) ◦ (IdA <sup>=</sup>-B <sup>×</sup> <sup>f</sup>).

Just as fp-pseudofunctors preserve products up to equivalence, cartesian closed pseudofunctors preserve products and exponentials up to equivalence.

**Definition 7.** A cartesian closed pseudofunctor or cc-pseudofunctor between cc-bicategories (B, <sup>Π</sup>n(−), <sup>=</sup>-) and (C, <sup>Π</sup>n(−), <sup>=</sup>-) is an fp-pseudofunctor (F, <sup>q</sup>×) equipped with specified equivalences <sup>m</sup>A,B : <sup>F</sup>(<sup>A</sup> <sup>=</sup>- B) (F A <sup>=</sup>- F B):q<sup>=</sup>- A,B for every A, B ∈ B, where <sup>m</sup>A,B : <sup>F</sup>(<sup>A</sup> <sup>=</sup>- B) <sup>→</sup> (F A <sup>=</sup>- F B) is the currying of <sup>F</sup>(evalA,B) ◦ <sup>q</sup><sup>×</sup> A <sup>=</sup>-B,A. A cc-pseudofunctor (F, <sup>q</sup><sup>×</sup>, <sup>q</sup><sup>=</sup>-) is strict if (F, <sup>q</sup>×) is a strict fp-pseudofunctor such that

$$\begin{aligned} F(A \Rightarrow B) &= (FA \Rightarrow FB) \\ F(\text{eval}\_{A,B}) &= \text{eval}\_{FA,FB} \\ F(\lambda t) &= \lambda (Ft) \\ \end{aligned} \qquad \begin{aligned} \text{F}(A \Rightarrow FB) \\ F(\varepsilon\_t) &= \varepsilon\_{Ft} \\ \end{aligned}$$

with equivalences given by the 2-cells

$$\mathtt{e}^{\dagger}(\mathtt{eval}\_{FA,FB}\otimes\kappa) : \operatorname{Id}\_{(FA\Rightarrow\Rightarrow FB)} \stackrel{\cong}{\Rightarrow} \lambda(\operatorname{eval}\_{FA,FB}\circ\operatorname{Id}\_{(FA\Rightarrow\Rightarrow FB)\times FA})$$

where <sup>κ</sup> is the canonical isomorphism IdF A <sup>=</sup>-F B <sup>×</sup> F A <sup>∼</sup><sup>=</sup> Id(F A <sup>=</sup>-F B)×F A.

Remark 3. As is well-known in the case of **Cat** (e.g. [44, IV.2]), every equivalence X Y in a bicategory gives rise to an adjoint equivalence between X and Y with the same 1-cells (see e.g. [42, pp. 28–29]). Thus, one may assume without loss of generality that all the equivalences in the preceding definition are adjoint equivalences. The same observation applies to the definition of fp-pseudofunctors.

Notation 4. For cc-bicategories <sup>B</sup> and <sup>C</sup> we write **cc**-**Bicat**(B, <sup>C</sup>) for the bicategory of cc-pseudofunctors, pseudonatural transformations and modifications (c.f. Notation 3).

## **3 Bicategorical glueing**

The glueing construction has been discovered in various forms, with correspondingly various names: the notions of logical relation [50,57], sconing [24], Freyd covers, and glueing (e.g. [40]) are all closely related (see e.g. [47] for an overview of the connections). Originally presented set-theoretically, the technique was quickly given categorical expression [43,47] and is now a standard component of the armoury for studying type theories (e.g. [40,12]).

The glueing gl(F) of categories **<sup>C</sup>** and **<sup>D</sup>** along a functor F : **<sup>C</sup>** <sup>→</sup> **<sup>D</sup>** may be defined as the comma category (id**<sup>D</sup>** <sup>↓</sup> <sup>F</sup>). We define bicategorical glueing analogously.

#### **Definition 8.**

1. Let F : A→C and G : B→C be pseudofunctors of bicategories. The comma bicategory (F <sup>↓</sup> G) has objects triples (A ∈ A, f : F A <sup>→</sup> GB, B ∈ B). The 1-cells (A, f, B) <sup>→</sup> (A , f , B ) are triples (p, α, q), where p : A <sup>→</sup> A and q : B <sup>→</sup> B are 1-cells and α is an invertible 2-cell α : f ◦ F p <sup>⇒</sup> Gq ◦ f. The 2-cells (p, α, q) <sup>⇒</sup> (p , α , q ) are pairs of 2-cells (σ : p <sup>⇒</sup> p , τ : q <sup>⇒</sup> q ) such that the following diagram commutes:

$$\begin{array}{ccc} f' \circ F(p) & \xrightarrow{f' \circ F(\sigma)} f' \circ F(p')\\ \alpha & & \downarrow \alpha'\\ G(q) \circ f & \xrightarrow{G(\tau) \circ f} G(q') \circ f \end{array} \tag{2}$$

Identities and horizontal composition are given by the following pasting diagrams.

Vertical composition, the identity 2-cell, and the structural isomorphisms are given component-wise.

2. The glueing bicategory gl(J) of bicategories B and C along a pseudofunctor J : B→C is the comma bicategory (id<sup>C</sup> ↓ J).

We call axiom (2) the cylinder condition due to its shape when viewed as a (3-dimensional) pasting diagram. Note that one directly obtains projection pseudofunctors B ← πdom −−− gl(J) <sup>π</sup>cod −−−→ C.

We develop some basic theory of glueing bicategories, which we shall put to use in Section 5. We follow the terminology of [15].

**Definition 9.** Let J : B→X be a pseudofunctor. The relative hom-pseudofunctor J : X → **Bicat**(Bop, **Cat**) is defined by JX := <sup>X</sup> (J(−), X).

Following [15], one might call the glueing bicategory gl(J) associated to a relative hom-pseudofunctor the bicategory of B-intensional Kripke relations of arity J, and view it as an intensional, bicategorical, version of the category of Kripke relations.

The relative hom-pseudofunctor preserves all bilimits that exist in its domain. For products, this may be described explicitly.

**Lemma 4.** For any fp-bicategory (<sup>X</sup> , <sup>Π</sup>n(−)) and pseudofunctor <sup>J</sup> : B→X , the relative hom-pseudofunctor J extends canonically to an fp-pseudofunctor.

Proof. Take q<sup>×</sup> <sup>X</sup>• to be the <sup>n</sup>-ary tupling <sup>n</sup> i=1<sup>X</sup> (J(−), Xi) −→ X (J(−), n i=1Xi). This forms a pseudonatural transformation with naturality witnessed by post.

For any pseudofunctor J : B→X there exists a pseudonatural transformation (l, l):Y ⇒ J ◦ <sup>J</sup> : B → **Bicat**(Bop, **Cat**) given by the functorial action of <sup>J</sup> on hom-categories. One may therefore define the following.

**Definition 10.** For any pseudofunctor J : B→X , define the extended Yoneda pseudofunctor <sup>Y</sup> : B → gl(J) by setting <sup>Y</sup>B := - <sup>Y</sup>B,(l, <sup>l</sup>)(−,B), <sup>J</sup><sup>B</sup> , <sup>Y</sup>f := (Yf,(φ<sup>J</sup> <sup>−</sup>,f )−<sup>1</sup>, <sup>J</sup>f), and Y(<sup>τ</sup> : <sup>f</sup> <sup>⇒</sup> <sup>f</sup> : <sup>B</sup> <sup>→</sup> <sup>B</sup> ) := (Yτ, <sup>J</sup>τ ). The cylinder condition holds by the naturality of φ<sup>J</sup>, and the 2-cells φ<sup>Y</sup> and ψ<sup>Y</sup> are (φ<sup>Y</sup>, φ<sup>J</sup>) and (ψ<sup>Y</sup>, ψ<sup>J</sup>), respectively.

The extended Yoneda pseudofunctor satisfies a corresponding 'extended Yoneda lemma' (c.f. [15, pp. 33]).

**Lemma 5.** For any pseudofunctor <sup>J</sup> : B→X and P = (P,(k, <sup>k</sup>), X) <sup>∈</sup> gl(J) there exists an equivalence of pseudofunctors gl(J)(Y(−), P) P and an invertible modification as in the diagram below. Hence Y is locally an equivalence.

Proof. The arrow marked is the composite of a projection and the equivalence arising from the Yoneda lemma. Its pseudo-inverse is the composite

$$P \stackrel{\curvearrowleft}{\longrightarrow} \mathbf{Bicat}(\mathcal{B}^{op}, \mathbf{Cat})(\mathbf{Y}(-), P) \to \operatorname{gl}(\langle \mathfrak{J} \rangle)(\underline{\mathbf{Y}}(-), \underline{P}) \tag{3}$$

in which the equivalence arises from the Yoneda lemma and the unlabelled pseudofunctor takes a pseudonatural transformation (j, <sup>j</sup>):YB <sup>⇒</sup> P to the triple with first component (j, <sup>j</sup>), third component <sup>j</sup>B(kB(IdB)) : <sup>J</sup><sup>B</sup> <sup>→</sup> <sup>X</sup> and second component defined using k and j. Chasing the definitions through and evaluating at A, B ∈ B, one sees that when P := <sup>Y</sup>B the composite (3) is equivalent to YA,B. Since (3) is locally an equivalence, Lemma 1(1) completes the proof.

## **4 Cartesian closed structure on the glueing bicategory**

It is well-known that, if **C** and **D** are cartesian closed categories, **D** has pullbacks, and F : **<sup>C</sup>** <sup>→</sup> **<sup>D</sup>** is cartesian, then gl(F) is cartesian closed (e.g. [40,12]). In this section we prove a corresponding result for the glueing bicategory. We shall be guided by the categorical proof, for which see e.g. [43, Proposition 2].

#### **4.1 Finite products in gl(**J**)**

**Proposition 1.** Let (B, <sup>Π</sup>n(−)) and (C, <sup>Π</sup>n(−)) be fp-bicategories and (J, <sup>q</sup>×) : B→C be an fp-pseudofunctor. Then gl(J) is an fp-bicategory with both projection pseudofunctors <sup>π</sup>dom and <sup>π</sup>cod strictly preserving products.

For a family of objects (Ci, ci, Bi)i=1,...,n, the <sup>n</sup>-ary product <sup>n</sup> i=1(Ci, ci, Bi) is defined to be the tuple n i=1 <sup>C</sup>i, <sup>q</sup><sup>×</sup> <sup>B</sup>• ◦ n i=1 ci, n i=1 B<sup>i</sup> . The kth projection <sup>π</sup>k is (πk, μk, πk), where <sup>μ</sup><sup>k</sup> is defined by commutativity of the following diagram:

For an <sup>n</sup>-ary family of 1-cells (gi, αi, fi):(Y, y,X) <sup>→</sup> (Ci, ci, Bi)(<sup>i</sup> = 1,...,n), the <sup>n</sup>-ary tupling is (g<sup>1</sup>,...,gn, {α<sup>1</sup>,...,αn},f<sup>1</sup>,...,fn), where {α<sup>1</sup>,...,αn}

is the composite


Finally, for every family of 1-cells (gi, αi, fi):(Y, y,X) <sup>→</sup> (Ci, ci, Bi) (<sup>i</sup> <sup>=</sup> <sup>1</sup>,...,n) we require a glued 2-cell <sup>π</sup>k ◦ (g1,...,gn, {α1,...,αn},f1,...,fn) <sup>⇒</sup> (gk, αk, fk) to act as the counit. We take simply ( (k) <sup>g</sup>• , (k) <sup>f</sup>• ). This pair forms a 2-cell in gl(J), and the required universal property holds pointwise.

Remark 4. If (J, <sup>q</sup>×) : B→X is an fp-pseudofunctor, then <sup>Y</sup> : B → gl(J) canonically extends to an fp-pseudofunctor. The pseudoinverse to Yπ1,..., <sup>Y</sup>πn is (−,..., <sup>=</sup>, <sup>∼</sup>=, <sup>q</sup>×), where the component of the isomorphism at (fi : <sup>X</sup> <sup>→</sup> <sup>B</sup>i)i=1,...,n is Ff• <sup>∼</sup><sup>=</sup> <sup>=</sup><sup>⇒</sup> IdF (ΠiBi)◦Ff• (c<sup>×</sup> <sup>B</sup>• )−1◦<sup>F</sup> f• =========⇒ q<sup>×</sup> <sup>B</sup>• ◦F π•◦Ff• q× <sup>B</sup>• ◦unpack =======⇒ q<sup>×</sup> <sup>B</sup>• ◦ F f•.

#### **4.2 Exponentials in gl(**J**)**

As in the 1-categorical case, the definition of currying in gl(J) employs pullbacks. <sup>A</sup> pullback of the cospan (X<sup>1</sup> −→ <sup>X</sup><sup>0</sup> ←− <sup>X</sup><sup>2</sup>) in a bicategory <sup>B</sup> is a bilimit for the strict pseudofunctor X : (1 −→ <sup>0</sup> ←− 2) → B determined by the cospan. We state the universal property in the form that will be most useful for our applications.

**Lemma 6.** The pullback of a cospan (X<sup>1</sup> f1 −→ <sup>X</sup><sup>0</sup> f2 ←− X<sup>2</sup>) in a bicategory <sup>B</sup> is determined, up to equivalence, by the following data and properties: a span (X1 γ1 ←− P <sup>γ</sup><sup>2</sup> −→ X<sup>2</sup>) in <sup>B</sup> and an invertible 2-cell filling the diagram on the left below

such that

1. for any other diagram as on the right above there exists a fill-in (u, Ξ1, Ξ<sup>2</sup>), namely a 1-cell <sup>u</sup> : <sup>Q</sup> <sup>→</sup> <sup>P</sup> and invertible 2-cells <sup>Ξ</sup>i : <sup>γ</sup>i ◦ <sup>u</sup> <sup>⇒</sup> <sup>μ</sup>i (<sup>i</sup> = 1, 2) satisfying

$$\begin{aligned} (f\_2 \circ \gamma\_2) \circ u &\xrightarrow{\cong} f\_2 \circ (\gamma\_2 \circ u) & \xrightarrow{f\_2 \diamond \Xi\_2} f\_2 \circ \mu\_2 \\ \tau^{\circ u} & & \downarrow^{\overline{\mu}} \\ (f\_1 \circ \gamma\_1) \circ u & \xrightarrow{\cong} f\_1 \circ (\gamma\_1 \circ u) & \xrightarrow{f\_1 \diamond \Xi\_1} f\_1 \circ \mu\_1 \end{aligned}$$

2. for any 1-cells v, w : <sup>Q</sup> <sup>→</sup> <sup>P</sup> and 2-cells <sup>Ψ</sup>i : <sup>γ</sup>i ◦ <sup>v</sup> <sup>⇒</sup> <sup>γ</sup>i ◦ <sup>w</sup> (<sup>i</sup> = 1, 2) satisfying

$$\begin{array}{c} \left(f\_{2}\circ\gamma\_{2}\right)\circ v \xrightarrow{\cong} f\_{2}\circ\left(\gamma\_{2}\circ v\right) \xrightarrow{f\_{2}\circ\psi\_{2}} f\_{2}\circ\left(\gamma\_{2}\circ w\right) \xrightarrow{\cong} \left(f\_{2}\circ\gamma\_{2}\right)\circ w\\ \overline{\gamma}\circ v \downarrow\\ \left(f\_{1}\circ\gamma\_{1}\right)\circ v \xrightarrow{\cong} f\_{1}\circ\left(\gamma\_{1}\circ v\right) \xrightarrow[f\_{1}\circ\psi\_{1}] \end{array} \begin{array}{c} \left(f\_{2}\circ\gamma\_{2}\right)\circ w\\ \downarrow\\ \overline{\gamma}\circ w \end{array}$$

there exists a unique 2-cell <sup>Ψ</sup> : <sup>v</sup> <sup>⇒</sup> <sup>w</sup> such that <sup>Ψ</sup>i <sup>=</sup> <sup>γ</sup>i ◦ <sup>Ψ</sup> (<sup>i</sup> = 1, 2).


We now define exponentials in the glueing bicategory. Precisely, we extend Proposition 1 to the following.

**Theorem 5.** Let (B, <sup>Π</sup>n(−), <sup>=</sup>-) and (C, <sup>Π</sup>n(−), <sup>=</sup>-) be cc-bicategories such that <sup>C</sup> has pullbacks. For any fp-pseudofunctor (J, <sup>q</sup>×) : (B, <sup>Π</sup>n(−)) <sup>→</sup> (C, <sup>Π</sup>n(−)), the glueing bicategory gl(J) has a cartesian closed structure with forgetful pseudofunctor <sup>π</sup>dom : gl(J) → B strictly preserving products and exponentials.

The evaluation map. We begin by defining the mapping (−) =-(=) and the evaluation 1-cell eval. For C := (C, c, B), C := (C , c , B ) <sup>∈</sup> gl(J) we set C <sup>=</sup>- C to be the left-hand vertical leg of the following pullback diagram, in which we write <sup>m</sup>B,B- := <sup>λ</sup>(J(evalB,B- ) ◦ q<sup>×</sup> B <sup>=</sup>-B-,B).

$$\begin{array}{c} C \supset C' \\ \downarrow \\ \mathfrak{I}(B \Rightarrow B') \end{array} \xrightarrow{q\_{c,c'}} \begin{array}{c} \mathfrak{l}\_{c,c'} \\ \downarrow \\ \downarrow \\ \mathfrak{I}(B \Rightarrow B') \end{array} \xrightarrow{\omega\_{c,c'}} \begin{array}{c} (C \Rightarrow C') \\ \downarrow \\ \downarrow \\ \lambda (c' \text{eval}\_{C,B'}) \end{array} \xrightarrow{\begin{array}{c} \omega\_{c,c'}} (C \Rightarrow C')} \begin{array}{c} \downarrow \\ \downarrow \\ (c' \text{eval}\_{C,B'}) \end{array} \\ \hline \\ \lambda (\text{eval}\_{\mathfrak{J}B,3B'} \circ ((\mathfrak{J}B \Rightarrow \mathfrak{J}B') \times c)) \circ m\_{B,B'} \end{array} \begin{array}{c} (C \Rightarrow \mathfrak{J}B') \\ \downarrow \\ \downarrow \\ \mathfrak{I} \end{array} \tag{4}$$

Example 7. The pullback (4) generalises the well-known definition of a logical relation of varying arity [36]. Indeed, where J := K is the relative hom-pseudofunctor for an fp-pseudofunctor (K, <sup>q</sup>×) : B→X between cc-bicategories, A ∈ B and X, X ∈ X , the functor <sup>m</sup>X,X- (A) takes a 1-cell f : <sup>K</sup>A <sup>→</sup> (X <sup>=</sup>- X ) in X to the pseudonatural transformation YA × X (K(−), X) ⇒ X (K(−), X ) with components λB . λ(<sup>ρ</sup> : <sup>B</sup> <sup>→</sup> A, u : <sup>K</sup><sup>B</sup> <sup>→</sup> <sup>X</sup>). evalX,X- ◦ f ◦ <sup>K</sup>(ρ), u. Intuitively, therefore, the pullback enforces the usual closure condition defining a logical relation at exponential type, while also tracking the isomorphism witnessing that this condition holds (c.f. [36,3,15]).

Notation 6. For reasons of space—particularly in pasting diagrams—we will sometimes write <sup>c</sup> := eval<sup>J</sup>B,JB- ◦ ((JB <sup>=</sup>- <sup>J</sup>B ) <sup>×</sup> c):(JB <sup>=</sup>- <sup>J</sup>B ) <sup>×</sup> C <sup>→</sup> <sup>J</sup>B when c : C <sup>→</sup> <sup>J</sup>B in <sup>C</sup>.

The evaluation map evalC,C is defined to be (evalC,C-◦(qc,c- <sup>×</sup> <sup>C</sup>),EC,C- , evalB,B- ), where the witnessing 2-cell EC,C is given by the pasting diagram below, in which the unlabelled arrow is q<sup>×</sup> (B <sup>=</sup>-B-,B) ◦ (pc,c-<sup>×</sup> c).

Here the bottom <sup>∼</sup><sup>=</sup> denotes a composite of <sup>Φ</sup>, structural isomorphisms and <sup>Φ</sup><sup>−</sup><sup>1</sup>, and the top <sup>∼</sup><sup>=</sup> denotes a composite of <sup>ω</sup>c,c- <sup>×</sup> C with instances of Φ, Φ<sup>−</sup><sup>1</sup>, and the structural isomorphisms.

The currying operation. Let R := (R, r, Q), C := (C, c, B) and C := (C , c , B ) and suppose given a 1-cell (t, α, s) : R×C <sup>→</sup> C . We construct λ(t, α, s) using the universal property (4) of the pullback. To this end, we define invertible composites <sup>U</sup>α and Tα as in the following two diagrams and set Lα := <sup>η</sup><sup>−</sup><sup>1</sup> • <sup>e</sup>† (U−<sup>1</sup> α ◦ <sup>α</sup> ◦ <sup>T</sup>α) : <sup>λ</sup>(c ◦ evalC,C- ) ◦ λt <sup>⇒</sup> (λ(c) ◦ <sup>m</sup>B,B-) ◦ (J(λs) ◦ r).

$$\begin{array}{llll} \texttt{292} & \text{M. Fori} \text{ and } \text{P. Asvile} \\\\ & \texttt{eval}\_{C,3B} \diamond \big( (\lambda(\widetilde{c}) \diamond m\_{B,B'}) \diamond (\widetilde{\mathfrak{J}}(\lambda s) \diamond r) \bigr) \times C \xrightarrow{\mathsf{U}\_{\alpha}} \mathfrak{J} \diamond \big( \mathsf{q}\_{Q,B}^{\times} \diamond (\mathsf{r} \times c) \Big) \\\\ & \texttt{\simeq} & & \\ & (\texttt{eval}\_{C,3B} \diamond (\lambda(\widetilde{c}) \diamond C)) \diamond (m\_{B,B'} \diamond (\Im(\lambda s) \diamond r)) \times C & \\ & & \texttt{\simeq} & \\ & \widetilde{c} \circ (m\_{B,B'} \diamond (\Im(\lambda s) \diamond r)) \times C & & \\ & & \widetilde{c} \circ (m\_{B,B'} \diamond (\Im(\lambda s) \diamond r)) \times C & \widetilde{\mathfrak{J}} (\operatorname{eval}\_{B,B'} \diamond (\lambda s \times B)) \diamond \big( \operatorname{\bf}\_{Q',B}^{\times} \diamond (r \times c) \Big) \\\\ & & & \texttt{\simeq} & \\ & (\texttt{ev2}\_{B,B,B'} \diamond (m\_{B,B'} \diamond \chi\_{B}) \diamond (\operatorname{\bf}\_{Q}^{\times} (\lambda s) \times \Im B) \diamond (r \times c)) & \\\\ & & \texttt{\simeq}\_{\left(\operatorname{\bf}\_{\operatorname{\bf$$

The unlabelled arrow is the canonical composite of natλs,id<sup>B</sup> with <sup>φ</sup><sup>J</sup> eval,λ(s)×B and structural isomorphisms. Tα is then defined using Uα:

$$\begin{aligned} \operatorname{eval}\_{C,\mathfrak{z}B'} \circ \Big(\lambda(c' \circ \operatorname{eval}\_{C,C'}) \circ \lambda t\Big) \times C &\xrightarrow{\operatorname{T}\_{\alpha}} \begin{array}{c} \operatorname{T}\_{\alpha} \\ \stackrel{\operatorname{T}\_{\alpha}}{\displaystyle} \\ \operatorname{\hkern{1.5.5.1097pt}{\$\operatorname{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf$$

Applying the universal property of the pullback (4) to Lα, one obtains a 1-cell lam(t) and a pair of invertible 2-cells <sup>Γ</sup>c,c and <sup>Δ</sup>c,cfilling the diagram

We define λ(t, α, s) := - lam(t), Γc,c- , λs .

The counit 2-cell. Finally we come to the counit. For a 1-cell t := (t, α, s) : (R, r, Q) <sup>×</sup> (C, c, B) <sup>→</sup> (C , c , B ) the 1-cell eval ◦ - λ(t, α, s) <sup>×</sup> (C, c, B) unwinds to the pasting diagram below, in which the unlabelled arrow is q<sup>×</sup> Q,B ◦ (<sup>r</sup> <sup>×</sup> <sup>c</sup>):

For the counit <sup>ε</sup>t we take the 2-cell with first component <sup>e</sup>t defined by

$$\begin{aligned} (\mathsf{eval}\_{C,C'} \circ (q\_{c,c'} \times C)) \circ (\underline{\mathsf{lam}}(t) \times C) &\xrightarrow{\mathsf{f}\_{t}} \xrightarrow{\mathsf{f}\_{t}} \end{aligned} $$
 
$$\begin{aligned} \mathtt{eval}\_{C,C'} \circ ((q\_{c,c'} \circ \underline{\mathsf{lam}}(t)) \times C) &\xrightarrow[\mathsf{eval}\_{C,C'} \circ (\Delta\_{c,c'} \times C)]{} \mathtt{eval}\_{C,C'} \circ (\lambda(t) \times C) \end{aligned} $$

and second component simply <sup>ε</sup>s : evalB,B- ◦ (λ(s) <sup>×</sup> B) <sup>⇒</sup> s. This pair forms an invertible 2-cell in gl(J). One checks this satisfies the required universal property in a manner analogous to the 1-categorical case (see [55] for the full details). This completes the proof of Theorem 5.

## **5 Relative full completeness**

We apply the theory developed in the preceding two sections to prove the relative full completeness result. As outlined in the introduction, this corresponds to a proof of conservativity of the theory of rewriting for the higher-order equational theory of rewriting in STLC over the algebraic equational theory of rewriting in STPC. We adapt 'Lafont's argument' [39, Annexe C] from the form presented in [16], for which we require bicategorical versions of the free cartesian category **F**×[**C**] and free cartesian closed category **F**<sup>×</sup>,<sup>→</sup>[**C**] over a category **C**. In line with the strategy for the STLC (c.f. [12, pp. 173–4]), we deal with the contravariance of the pseudofunctor (− =- =) by restricting to a bicategory of cc-pseudofunctors, pseudonatural equivalences (that is, pseudonatural transformations for which each component is a given equivalence), and invertible modifications. We denote this with the subscript , <sup>∼</sup>=.

**Lemma 7.** For any bicategory <sup>B</sup>, fp-bicategory (C, <sup>Π</sup>n(−)) and cc-bicategory (D, <sup>Π</sup>n(−), <sup>=</sup>-):

1. There exists an fp-bicategory <sup>F</sup>×[B] and a pseudofunctor η<sup>×</sup> : B→F×[B] such that composition with <sup>η</sup><sup>×</sup> induces a biequivalence

$$\begin{array}{c} \mathbf{fp-Bicat} (\mathcal{F}^\times [\mathcal{B}], \mathcal{C}) \xrightarrow{\cong} \mathbf{Bicat} (\mathcal{B}, \mathcal{C}), \end{array}$$

2. There exists a cc-bicategory <sup>F</sup>×,→[B] and a pseudofunctor η=- : B→F×,→[B] such that composition with η=induces a biequivalence

**cc**-**Bicat** ,∼<sup>=</sup>(F<sup>×</sup>,<sup>→</sup>[B], <sup>D</sup>) −→ **Bicat**(B, <sup>D</sup>)

Proof (sketch). A syntactic construction suffices: one defines formal products and exponentials and then quotients by the axioms (see [48, p. 79] or [55]).

Thus, for any bicategory <sup>B</sup>, fp-bicategory (C, <sup>Π</sup>n(−)), and pseudofunctor F : B→C there exists an fp-pseudofunctor F # : <sup>F</sup><sup>×</sup>[B] → C and an equivalence F # ◦ η<sup>×</sup> F. Moreover, for any fp-pseudofunctor G : <sup>F</sup><sup>×</sup>[B] → C such that G ◦ η<sup>×</sup> F one has G F #. A corresponding result holds for cc-bicategories and cc-pseudofunctors.

**Theorem 7.** For any bicategory <sup>B</sup> the universal fp-pseudofunctor ι : <sup>F</sup><sup>×</sup>[B] <sup>→</sup> <sup>F</sup><sup>×</sup>,<sup>→</sup>[B] extending η= is locally an equivalence. Hence η=- : B→F<sup>×</sup>,<sup>→</sup>[B] is locally an equivalence.

Proof. Since ι preserves finite products, the bicategory gl(ι) is cartesian closed (Theorem 5). The composite K := <sup>Y</sup> ◦ η<sup>×</sup> : B → gl(ι) therefore induces a cc-pseudofunctor <sup>K</sup># : <sup>F</sup><sup>×</sup>,<sup>→</sup>[B] <sup>→</sup> gl(ι).

First observe that (K# ◦ ι) ◦ η<sup>×</sup> <sup>K</sup># ◦ η=- K = <sup>Y</sup> ◦ η<sup>×</sup>. Since <sup>Y</sup> is canonically an fp-pseudofunctor (Remark 4), it follows that <sup>K</sup># ◦ ι <sup>Y</sup>. Since <sup>Y</sup> is locally an equivalence (Lemma 5), Lemma 1(1) entails that <sup>K</sup># ◦ ι is locally an equivalence.

Next, examining the definition of <sup>Y</sup> one sees that <sup>π</sup>dom ◦ <sup>Y</sup> <sup>=</sup> <sup>ι</sup>, and so

$$(\pi\_{\text{dom}} \circ \mathcal{K}^\#) \circ \eta^{\equiv \diamond} \simeq (\pi\_{\text{dom}} \circ \underline{\mathbf{Y}}) \circ \eta^\times \simeq \iota \circ \eta^\times \simeq \eta^{\equiv \diamond}$$

It follows that <sup>π</sup>dom ◦ <sup>K</sup># idF×,→[B], and hence that <sup>π</sup>dom ◦ <sup>K</sup># is also locally an equivalence.

Now consider the composite F<sup>×</sup>[B] <sup>ι</sup> → F <sup>−</sup> <sup>×</sup>,<sup>→</sup>[B] <sup>K</sup># −−→ gl(ι) <sup>π</sup>dom −−−→ F<sup>×</sup>,<sup>→</sup>[B]. By Lemma 1(2) and the preceding, ι is locally an equivalence. Finally, it is direct from the construction of <sup>F</sup><sup>×</sup>[B] that η<sup>×</sup> is locally an equivalence; thus, so are ι ◦ η<sup>×</sup> η<sup>=</sup>-.

Acknowledgements. We thank all the anonymous reviewers for their comments: these improved the paper substantially. We are especially grateful to the reviewer who pointed out an oversight in the original formulation of Lemma 1(2), which consequently affected the argument in Theorem 7, and provided the elegant fix therein.

The second author was supported by a Royal Society University Research Fellow Enhancement Award.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **A duality theoretic view on limits of finite structures***-*

Mai Gehrke1, Tom´aˇs Jakl1, and Luca Reggio2()

<sup>1</sup> CNRS and Universit´e Cˆote d'Azur, Nice, France

{mgehrke,tomas.jakl}@unice.fr <sup>2</sup> Institute of Computer Science of the Czech Academy of Sciences, Prague, Czech Republic and Mathematical Institute, University of Bern, Switzerland luca.reggio@math.unibe.ch

**Abstract.** A systematic theory of *structural limits* for finite models has been developed by Neˇsetˇril and Ossona de Mendez. It is based on the insight that the collection of finite structures can be embedded, via a map they call the *Stone pairing*, in a space of measures, where the desired limits can be computed. We show that a closely related but finer grained space of measures arises — via Stone-Priestley duality and the notion of types from model theory — by enriching the expressive power of firstorder logic with certain "probabilistic operators". We provide a sound and complete calculus for this extended logic and expose the functorial nature of this construction.

The consequences are two-fold. On the one hand, we identify the logical gist of the theory of structural limits. On the other hand, our construction shows that the duality-theoretic variant of the Stone pairing captures the adding of a layer of quantifiers, thus making a strong link to recent work on semiring quantifiers in logic on words. In the process, we identify the model theoretic notion of *types* as the unifying concept behind this link. These results contribute to bridging the strands of logic in computer science which focus on semantics and on more algorithmic and complexity related areas, respectively.

**Keywords:** Stone duality · finitely additive measures · structural limits · finite model theory · formal languages · logic on words

## **1 Introduction**

While topology plays an important role, via Stone duality, in many parts of semantics, topological methods in more algorithmic and complexity oriented areas of theoretical computer science are not so common. One of the few examples,

<sup>-</sup> This project has been supported by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No.670624). Luca Reggio has received an individual support under the grants GA17-04630S of the Czech Science Foundation, and No.184693 of the Swiss National Science Foundation.

the one we want to consider here, is the study of limits of finite relational structures. We will focus on the *structural limits* introduced by Neˇsetˇril and Ossona de Mendez [15,17]. These provide a common generalisation of various notions of limits of finite structures studied in probability theory, random graphs, structural graph theory, and finite model theory. The basic construction in this work is the so-called *Stone pairing*. Given a relational signature σ and a first-order formula <sup>ϕ</sup> in the signature <sup>σ</sup> with free variables <sup>v</sup>1,...,vn, define

$$<\langle \varphi, A \rangle = \frac{|\{\overline{a} \in A^n \mid A = \varphi(\overline{a})\}|}{|A|^n} \qquad \text{(the probability that a random)} \qquad (1)$$

Neˇsetˇril and Ossona de Mendez view the map A → --, A as an embedding of the finite σ-structures into the space of probability measures over the Stone space dual to the Lindenbaum-Tarski algebra of all first-order formulas in the signature σ. This space is complete and thus provides the desired limit objects for all sequences of finite structures which embed as Cauchy sequences.

Another example of topological methods in an algorithmically oriented area of computer science is the use of profinite monoids in automata theory. In this setting, profinite monoids are the subject of the extensive theory, based on theorems by Eilenberg and Reiterman, and used, among others, to settle decidability questions [18]. In [4], it was shown that this theory may be understood as an application of Stone duality, thus making a bridge between semantics and more algorithmically oriented work. Bridging this semantics-versus-algorithmics gap in theoretical computer science has since gained quite some momentum, notably with the recent strand of research by Abramsky, Dawar and co-workers [2,3]. In this spirit, a natural question is whether the structural limits of Neˇsetˇril and Ossona de Mendez also can be understood semantically, and in particular whether the topological component may be seen as an application of Stone duality.

More precisely, recent work on understanding quantifiers in the setting of logic on finite words [5] has shown that adding a layer of certain quantifiers (such as classical and modular quantifiers) corresponds dually to measure space constructions. The measures involved are not classical but only finitely additive and they take values in finite semirings rather than in the unit interval. Nevertheless, this appearance of *measures as duals of quantifiers* begs the further question whether the measure spaces in the theory of structural limits may be obtained via Stone duality from a semantic addition of certain quantifiers to classical first-order logic.

The purpose of this paper is to address this question. Our main result is that the Stone pairing of Neˇsetˇril and Ossona de Mendez is related by a retraction to a Stone space of measures, which is dual to the Lindenbaum-Tarski algebra of a logic fragment obtained from first-order logic by adding one layer of probabilistic quantifiers, and which arises in exactly the same way as the spaces of semiring-valued measures in logic on words. That is, the Stone pairing, although originating from other considerations, may be seen as arising by duality from a semantic construction.

A foreseeable hurdle is that spaces of classical measures are valued in the unit interval [0, 1] which is not zero-dimensional and hence outside the scope of Stone duality. This is well-known to cause problems e.g. in attempts to combine nondeterminism and probability in domain theory [12]. However, in the structural limits of Neˇsetˇril and Ossona de Mendez, at the base, one only needs to talk about finite models equipped with normal distributions and thus only the finite intervals <sup>I</sup>n <sup>=</sup> {0, <sup>1</sup> n , 2 n ,..., <sup>1</sup>} are involved. A careful duality-theoretic analysis identifies a codirected diagram (i.e. an inverse limit system) based on these intervals compatible with the Stone pairing. The resulting inverse limit, which we denote **Γ**, is a Priestley space. It comes equipped with an algebra-like structure, which allows us to reformulate many aspects of the theory of structural limits in terms of **<sup>Γ</sup>**-valued measures as opposed to [0, 1]-valued measures.

The analysis justifying the structure of **Γ** is based on duality theory for double quasi-operator algebras [7,8]. In the presentation, we have tried to compromise between giving interesting topo-relational insights into why **Γ** is as it is, and not overburdening the reader with technical details. Some interesting features of **Γ**, dictated by the nature of the Stone pairing and the ensuing codirected diagram, are that


These features are a consequence of general theory and precisely allow us to witness continuous phenomena relative to [0, 1] in the setting of **<sup>Γ</sup>**.

#### **Our contribution**

We show that the ambient measure space for the structural limits of Neˇsetˇril and Ossona de Mendez can be obtained via *"adding a layer of quantifiers"* in a suitable enrichment of first-order logic. The conceptual framework for seeing this is that of *types* from classical model theory. More precisely, we will see that a variant of the Stone pairing is a map into a space of measures with values in a Priestley space **Γ**. Further, we show that this map is in fact the embedding of the finite structures into the space of (0-)types of an extension of first-order logic, which we axiomatise. On the other hand, **<sup>Γ</sup>**-valued measures and [0, 1]-valued measures are tightly related by a retraction-section pair which allows the transfer of properties. These results identify the logical gist of the theory of structural limits and provide a new interesting connection between logic on words and the theory of structural limits in finite model theory.

*Outline of the paper.* In section 2 we briefly recall Stone-Priestley duality, its application in logic via spaces of types, and the particular instance of logic on words (needed only to show the similarity of the constructions). In Section 3 we introduce the Priestley space **Γ** with its additional operations, and show that it admits [0, 1] as a retract. The spaces of **<sup>Γ</sup>**-valued measures are introduced in Section 4, and the retraction of **<sup>Γ</sup>** onto [0, 1] is lifted to the appropriate spaces of measures. In Section 5 we introduce the **Γ**-valued Stone pairing and make the link with logic on words. Further, we compare convergence in the space of **Γ**-valued measures with the one considered by Neˇsetˇril and Ossona de Mendez. Finally, in Section 6 we show that constructing the space of **Γ**-valued measures dually corresponds to enriching the logic with probabilistic operators.

## **2 Preliminaries**

*Notation.* Throughout this paper, if X <sup>f</sup> −→ Y <sup>g</sup> −→ Z are functions, their composition is denoted g · f. For a subset S <sup>⊆</sup> X, f-S : <sup>S</sup> <sup>→</sup> <sup>Y</sup> is the obvious restriction. Given any set T, <sup>℘</sup>(T) denotes its power-set. Further, for a poset <sup>P</sup>, <sup>P</sup><sup>∂</sup> is the poset obtained by turning the order of P upside down.

## **2.1 Stone-Priestley duality**

In this paper, we will need Stone duality for bounded distributive lattices in the order topological form due to Priestley [19]. It is a powerful and well established tool in the study of propositional logic and semantics of programming languages, see e.g. [9,1] for major landmarks. We briefly recall how this duality works.

<sup>A</sup> *compact ordered space* is a pair (X, <sup>≤</sup>) where X is a compact space and <sup>≤</sup> is a partial order on X which is closed in the product topology of X×X. (Note that such a space is automatically Hausdorff). A compact ordered space is a *Priestley space* provided it is *totally order-disconnected*. That is, for all x, y <sup>∈</sup> X such that x <sup>≤</sup> y, there is a *clopen* (i.e. simultaneously closed and open) C <sup>⊆</sup> X which is an up-set for <sup>≤</sup>, and satisfies x <sup>∈</sup> C but y /<sup>∈</sup> C. We recall the construction of the Priestley space of a distributive lattice D. 3

A non-empty proper subset F <sup>⊂</sup> D is a *prime filter* if it is *(i)* upward closed (in the natural order of D), *(ii)* closed under finite meets, and *(iii)* if a <sup>∨</sup> b <sup>∈</sup> F, either <sup>a</sup> <sup>∈</sup> <sup>F</sup> or <sup>b</sup> <sup>∈</sup> <sup>F</sup>. Denote by <sup>X</sup>D the set of all prime filters of <sup>D</sup>. By Stone's Prime Filter Theorem, the map

$$\{\text{l-}\} \colon D \to \{\mathcal{O}(X\_D), \ a \mapsto \lbrack a \rbrack = \{F \in X\_D \mid a \in F\}$$

is an embedding. Priestley's insight was that <sup>D</sup> can be recovered from <sup>X</sup>D, if the latter is equipped with the inclusion order and the topology generated by the sets of the form <sup>a</sup> and their complements. This makes <sup>X</sup>D into a Priestley space — the *dual space* of D — and the map -- is an isomorphism between <sup>D</sup> and the lattice of clopen up-sets of <sup>X</sup>D. Conversely, any Priestley space <sup>X</sup> is the dual space of the lattice of its clopen up-sets. We call the latter the *dual lattice* of X. This correspondence extends to morphisms. In fact, Priestley duality states that the category of distributive lattices with homomorphisms is dually equivalent to the category of Priestley spaces and continuous monotone maps.

<sup>3</sup> We assume all distributive lattices are bounded, with the bottom and top denoted by 0 and 1, respectively. The bounds need to be preserved by homomorphisms.

When restricting to Boolean algebras, we recover the celebrated Stone duality restricted to Boolean algebras and *Boolean spaces*, i.e. compact Hausdorff spaces in which the clopen subsets form a basis.

#### **2.2 Stone duality and logic: type spaces**

The *theory of types* is an important tool for first-order logic. We briefly recall the concept as it is closely related to, and provides the link between, two otherwise unrelated occurrences of topological methods in theoretical computer science.

Consider a signature σ and a first-order theory T in this signature. For each <sup>n</sup> <sup>∈</sup> <sup>N</sup>, let Fmn denote the set of first-order formulas whose free variables are among <sup>v</sup> <sup>=</sup> {v1,...,vn}, and let Modn(T) denote the class of all pairs (A, α) where A is a model of T and α is an interpretation of <sup>v</sup> in <sup>A</sup>. Then the satisfaction relation, (A, α) <sup>|</sup><sup>=</sup> <sup>ϕ</sup>, is a binary relation from Modn to Fmn. It induces the equivalence relations of elementary equivalence ≡ and logical equivalence ≈ on these sets, respectively. The quotient FOn(T) = Fmn/<sup>≈</sup> carries a natural Boolean algebra structure and is known as the n*-th Lindenbaum-Tarski algebra* of <sup>T</sup>. Its dual space is Typn(T), the *space of* <sup>n</sup>*-types* of <sup>T</sup>, whose points can be identified with elements of Modn(T)/≡. The Boolean algebra FO(T) of *all* first-order formulas modulo logical equivalence over T is the directed colimit of the FOn(T) for <sup>n</sup> <sup>∈</sup> <sup>N</sup> while its dual space, Typ(T), is the codirected limit of the Typn(T) for <sup>n</sup> <sup>∈</sup> <sup>N</sup> and consists of models equipped with interpretations of the full set of variables.

If we want to study finite models, there are two equivalent approaches: e.g. at the level of sentences, we can either consider the theory <sup>T</sup>fin of finite <sup>T</sup>-models, or the closure of the collection of all finite T-models in the space Typ0(T). This closure yields a space, which should tell us about finite T-structures. Indeed, it is equal to Typ0(Tfin), the space of pseudofinite T-structures. For an application of this, see [10]. Below, we will see an application in finite model theory of the case T <sup>=</sup> <sup>∅</sup> (in this case we write FO(σ) and Typ(σ) instead of FO(∅) and Typ(∅)).

In light of the theory of types as exposed above, the Stone pairing of Neˇsetˇril and Ossona de Mendez (see equation (1)) can be regarded as an embedding of finite structures into the space of probability measures on Typ(σ), which settheoretically are finitely additive functions FO(σ) <sup>→</sup> [0, 1].

#### **2.3 Duality and logic on words**

As mentioned in the introduction, spaces of measures arise via duality in *logic on words* [5]. Logic on words, as introduced by B¨uchi, see e.g. [14] for a recent survey, is a variation and specialisation of finite model theory where only models based on words are considered. I.e., a word <sup>w</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> is seen as a relational structure on {1,..., <sup>|</sup>w|}, where <sup>|</sup>w<sup>|</sup> is the length of w, equipped with a unary relation <sup>P</sup>a, for each <sup>a</sup> <sup>∈</sup> <sup>A</sup>, singling out the positions in the word where the letter <sup>a</sup> appears. Each sentence ϕ in a language interpretable over these structures yields a language <sup>L</sup>ϕ <sup>⊆</sup> <sup>A</sup><sup>∗</sup> consisting of the words satisfying <sup>ϕ</sup>. Thus, logic fragments are considered modulo the theory of finite words and the Lindenbaum-Tarski algebras are subalgebras of <sup>℘</sup>(A<sup>∗</sup>) consisting of the appropriate <sup>L</sup>ϕ's, cf. [10] for a treatment of first-order logic on words.

For lack of logical completeness, the duals of the Lindenbaum-Tarski algebras have more points than those given by models. Nevertheless, the dual spaces of types, which act as compactifications and completions of the collections of models, provide a powerful tool for studying logic fragments by topological means. The central notion is that of *recognition*, in which, a Boolean subalgebra B ⊆ <sup>℘</sup>(A<sup>∗</sup>) is studied by means of the dual map <sup>η</sup> : <sup>β</sup>(A<sup>∗</sup>) <sup>→</sup> <sup>X</sup><sup>B</sup>. Here <sup>β</sup>(A<sup>∗</sup>) is the Stone dual of <sup>℘</sup>(A<sup>∗</sup>), also known in topology as the Cech-Stone compactifica- <sup>ˇ</sup> tion of the discrete space <sup>A</sup><sup>∗</sup>, and <sup>X</sup><sup>B</sup> is the Stone dual of <sup>B</sup>. The set <sup>A</sup><sup>∗</sup> embeds in <sup>β</sup>(A<sup>∗</sup>), and <sup>η</sup> is uniquely determined by its restriction <sup>η</sup><sup>0</sup> : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>X</sup><sup>B</sup>. Now, Stone duality implies that <sup>L</sup> <sup>⊆</sup> <sup>A</sup><sup>∗</sup> is in <sup>B</sup> iff there is a clopen subset <sup>V</sup> <sup>⊆</sup> <sup>X</sup><sup>B</sup> so that η<sup>−</sup><sup>1</sup> <sup>0</sup> (<sup>V</sup> ) = <sup>L</sup>. Anytime the latter is true for a map <sup>η</sup> and a language <sup>L</sup> as above, one says that η *recognises* L. 4

When studying logic fragments via recognition, the following inductive step is central: given a notion of quantifier and a recogniser for a Boolean algebra of formulas with a free variable, construct a recogniser for the Boolean algebra generated by the formulas obtained by applying the quantifier. This problem was solved in [5], using duality theory, in a general setting of *semiring quantifiers*. The latter are defined as follows: let (S, <sup>+</sup>, ·, <sup>0</sup>S, <sup>1</sup>S) be a semiring, and <sup>k</sup> <sup>∈</sup> <sup>S</sup>. Given a formula <sup>ψ</sup>(v), the formula <sup>∃</sup>S,kv.ψ(v) is true of a word <sup>w</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> iff <sup>k</sup> = 1S+···+1S, m times, where m is the number of assignments of the variable v in w satisfying ψ(v). If S <sup>=</sup> <sup>Z</sup>/qZ, we obtain the so-called *modular quantifiers*, and for S the two-element lattice we recover the existential quantifier ∃.

To deal with formulas with a free variable, one considers maps of the form <sup>f</sup> : <sup>β</sup>((<sup>A</sup> <sup>×</sup> 2)∗) <sup>→</sup> X (the extra bit in A <sup>×</sup> 2 is used to mark the interpretation of the free variable). In [5] (see also [6]), it was shown that <sup>L</sup>ψ(v) is recognised by <sup>f</sup> iff for every <sup>k</sup> <sup>∈</sup> <sup>S</sup> the language <sup>L</sup><sup>∃</sup>S,kv.ψ(v) is recognised by the composite

$$\forall \; k \in S \text{ the language } L\_{\exists\_{S,k}v.\psi(v)} \text{ is recognized by the composite}$$

$$\xi \colon A^\* \xrightarrow{R} \hat{\mathbf{S}}(\beta((A \times 2)^\*)) \xrightarrow{\hat{\mathbf{S}}(f)} \hat{\mathbf{S}}(X), \tag{2}$$

where **<sup>S</sup>**-(X) is the space of finitely additive S-valued measures on X, and R maps <sup>w</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> to the measure <sup>μ</sup>w : <sup>℘</sup>((A×2)∗) <sup>→</sup> <sup>S</sup> sending <sup>K</sup> <sup>⊆</sup> (A×2)<sup>∗</sup> to the sum 1S <sup>+</sup> ··· + 1S, <sup>n</sup>w,K times. Here, <sup>n</sup>w,K is the number of interpretations <sup>α</sup> of the free variable v in w such that the pair (w, α), seen as an element of (A×2)∗, belongs to K. Finally, **<sup>S</sup>**-(f) sends a measure to its pushforward along f.

## **3 The space Γ**

Central to our results is a Priestley space **<sup>Γ</sup>** closely related to [0, 1], in which our measures will take values. Its construction comes from the insight that the range

<sup>4</sup> Here, being beyond the scope of this paper, we are ignoring the important role of the monoid structure available on the spaces (in the form of profinite monoids or BiMs, cf. [10,5]).

of the Stone pairing --, A, for a finite structure A and formulas restricted to a fixed number of free variables, can be confined to a chain <sup>I</sup>n <sup>=</sup> {0, <sup>1</sup> n , 2 n ,..., <sup>1</sup>}. Moreover, the floor functions <sup>f</sup>mn,n : <sup>I</sup>mn - <sup>I</sup>n are monotone surjections. The ensuing system {fmn,n : <sup>I</sup>mn - <sup>I</sup>n <sup>|</sup> m, n <sup>∈</sup> <sup>N</sup>} can thus be seen as a codirected diagram of finite discrete posets and monotone maps. Let us define **Γ** to be the limit of this diagram. Then, **Γ** is naturally equipped with a structure of Priestley space, see e.g. [11, Corollary VI.3.3], and can be represented as based on the set

$$\{r^- \mid r \in (0,1] \} \cup \{q^\diamond \mid q \in \mathbb{Q} \cap [0,1] \}.$$

The order of **Γ** is the unique total order which has 0◦ as bottom element, satisfies r<sup>∗</sup> < s<sup>∗</sup> if and only if r<s for <sup>∗</sup> ∈ {<sup>−</sup>, ◦}, and such that q◦ is a cover of q<sup>−</sup> for every rational q <sup>∈</sup> (0, 1] (i.e. q<sup>−</sup> < q◦, and there is no element strictly in between). In a sense, the values q<sup>−</sup> represent approximations of the values of the form <sup>q</sup>◦. Cf. Figure 1. The topology of **<sup>Γ</sup>** is generated by the sets of the form

$$\uparrow p^{\diamond} = \{ x \in \Gamma \mid p^{\diamond} \le x \} \quad \text{and} \quad \downarrow q^{-} = \{ x \in \Gamma \mid x \le q^{-} \}.$$

for p, q <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1] such that q = 0. The distributive lattice dual to **<sup>Γ</sup>**, denoted by **L**, is given by

**<sup>L</sup>** <sup>=</sup> {⊥} ∪ (<sup>Q</sup> <sup>∩</sup> [0, 1])<sup>∂</sup>, with <sup>⊥</sup> <sup>&</sup>lt;**<sup>L</sup>** <sup>q</sup> and <sup>q</sup> <sup>≤</sup>**<sup>L</sup>** <sup>p</sup> for every <sup>p</sup> <sup>≤</sup> <sup>q</sup> in <sup>Q</sup> <sup>∩</sup> [0, 1].

$$\mathbf{T} = \begin{array}{c} \mathbf{r}^- = \mathbf{r}^{1^\circ} \\ \mathbf{r}^- = \mathbf{r}^{\circ} \\ \mathbf{r}^- = \mathbf{r}^{\circ} \\ \mathbf{r}^- = \mathbf{0}^\circ \end{array} \qquad \longleftrightarrow \cdots \qquad \mathbf{L} = \begin{array}{c} \mathbf{r}^{\circ} \\ \mathbf{r}^{\circ} \\ \mathbf{r}^- \end{array}$$

**Fig. 1.** The Priestley space **Γ** and its dual lattice **L**

#### **3.1 The algebraic structure on Γ**

When defining measures we need an algebraic structure available on the space of values. The space **Γ** fulfils this requirement as it comes equipped with a partial operation <sup>−</sup>: dom(−) <sup>→</sup> **<sup>Γ</sup>**, where dom(−) = {(x, y) <sup>∈</sup> **<sup>Γ</sup>** <sup>×</sup> **<sup>Γ</sup>** <sup>|</sup> y <sup>≤</sup> x} and

$$\begin{cases} r^\diamond - s^\diamond &= \ (r-s)^\diamond & \qquad r^\diamond - s^-\\ r^- - s^\diamond &= \ (r-s)^- & \qquad r^- - s^- \end{cases} \\ \\ = \begin{cases} (r-s)^\diamond & \text{if } r-s \in \mathbb{Q} \\ (r-s)^- & \text{otherwise.} \end{cases}$$

In fact, this (partial) operation is dual to the truncated addition on the lattice **L**. However, explaining this would require us to delve into extended Priestley duality for lattices with operations, which is beyond the scope of this paper. See [9] and also [7,8] for details. It also follows from the general theory that there exists another partial operation definable from −, namely:

$$
\sim \colon \text{dom}(-) \to \Gamma, \quad x \sim y = \bigvee \{ x - q^\diamond \mid y < q^\diamond \le x \}.
$$

Next, we collect some basic properties of − and ∼, needed in Section 4, which follow from the general theory of [7,8]. First, recall that a map into an ordered topological space is *lower* (resp. *upper* ) *semicontinuous* provided the preimage of any open down-set (resp. open up-set) is open.

**Lemma 1.** *If* dom(−) *is seen as a subspace of* **Γ** × **Γ**∂*, the following hold:*


#### **3.2 The retraction Γ** -**[0***,* **1]**

In this section we show that, with respect to appropriate topologies, the unit interval [0, 1] can be obtained as a topological retract of **<sup>Γ</sup>**, in a way which is compatible with the operation −. This will be important in Sections 4 and 5, where we need to move between [0,1]-valued and **Γ**-valued measures. Let us define the monotone surjection given by collapsing the doubled elements:

$$
\gamma \colon \Gamma \to [0, 1], \ r^-, r^\diamond \mapsto r. \tag{3}
$$

The map γ has a right adjoint, given by

$$\mu \colon [0,1] \to \Gamma, \ r \mapsto \begin{cases} r^{\diamond} & \text{if } r \in \mathbb{Q} \\ r^{-} & \text{otherwise.} \end{cases} \tag{4}$$

Indeed, it is readily seen that γ(y) <sup>≤</sup> x iff y <sup>≤</sup> ι(x), for all y <sup>∈</sup> **<sup>Γ</sup>** and x <sup>∈</sup> [0, 1]. The composition γ · ι coincides with the identity on [0, 1], i.e. ι is a section of γ. Moreover, this retraction lifts to a topological retract provided we equip **Γ** and [0, 1] with the topologies consisting of the open down-sets:

**Lemma 2.** *The map* γ : **<sup>Γ</sup>** <sup>→</sup> [0, 1] *is continuous and the map* ι: [0, 1] <sup>→</sup> **<sup>Γ</sup>** *is lower semicontinuous.*

*Proof.* To check continuity of γ observe that, for a rational q <sup>∈</sup> (0, 1), γ<sup>−</sup><sup>1</sup>(q, 1] and γ<sup>−</sup><sup>1</sup>[0, q) coincide, respectively, with the open sets

$$\bigcup \{ \uparrow p^{\diamond} \mid p \in \mathbb{Q} \cap [0,1] \text{ and } q < p \} \text{ and } \bigcup \{ \downarrow p^{-} \mid p \in \mathbb{Q} \cap (0,1] \text{ and } p < q \}.$$

Also, ι is lower semicontinuous, for ι <sup>−</sup><sup>1</sup>(↓q<sup>−</sup>) = [0, q) whenever q <sup>∈</sup> <sup>Q</sup>∩(0, 1].

It is easy to see that both γ and ι preserve the minus structure available on **<sup>Γ</sup>** and [0,1] (the unit interval is equipped with the usual minus operation x <sup>−</sup> y defined whenever y <sup>≤</sup> x), that is,

• γ(x <sup>−</sup> y) = γ(x <sup>∼</sup> y) = γ(x) <sup>−</sup> γ(y) whenever y <sup>≤</sup> x in **<sup>Γ</sup>**, and • ι(x <sup>−</sup> y) = ι(x) <sup>−</sup> ι(y) whenever y <sup>≤</sup> x in [0,1].

**Remark.** ι: [0, 1] <sup>→</sup> **<sup>Γ</sup>** is not upper semicontinuous because, for every q <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1], ι <sup>−</sup><sup>1</sup>(↑q◦) = {x <sup>∈</sup> [0, 1] <sup>|</sup> q◦ <sup>≤</sup> ι(x)} <sup>=</sup> {x <sup>∈</sup> [0, 1] <sup>|</sup> γ(q◦) <sup>≤</sup> x} = [q, 1].

## **4 Spaces of measures valued in Γ and in [0***,* **1]**

The aim of this section is to replace [0, 1]-valued measures by **<sup>Γ</sup>**-valued measures. The reason for doing this is two-fold. First, the space of **Γ**-valued measures is Priestley (Proposition 4), and thus amenable to a duality theoretic treatment and a dual logic interpretation (cf. Section 6). Second, it retains more topological information than the space of [0, 1]-valued measures. Indeed, the former retracts onto the latter (Theorem 10).

Let D be a distributive lattice. Recall that, classically, a monotone function m: D <sup>→</sup> [0, 1] is a (finitely additive, probability) measure provided m(0) = 0, m(1) = 1, and m(a) + m(b) = m(a <sup>∨</sup> b) + m(a <sup>∧</sup> b) for every a, b <sup>∈</sup> D. The latter property is equivalently expressed as

$$\forall a, b \in D, \ m(a) - m(a \wedge b) = m(a \vee b) - m(b). \tag{5}$$

We write <sup>M</sup>I(D) for the set of all measures D <sup>→</sup> [0, 1], and regard it as an ordered topological space, with the structure induced by the product order and product topology of [0, 1]D. The notion of (finitely additive, probability) **Γ**-valued measure is analogous to the classical one, except that the finite additivity property (5) splits into two conditions, involving − and ∼.

**Definition 3.** *Let* D *be a distributive lattice. A* **<sup>Γ</sup>**-valued measure *(or simply a* measure*) on* D *is a function* μ: D <sup>→</sup> **<sup>Γ</sup>** *such that*


μ(a) <sup>∼</sup> μ(a <sup>∧</sup> b) <sup>≤</sup> μ(a <sup>∨</sup> b) <sup>−</sup> μ(b) *and* μ(a) <sup>−</sup> μ(a <sup>∧</sup> b) <sup>≥</sup> μ(a <sup>∨</sup> b) <sup>∼</sup> μ(b).

*We denote by* <sup>M</sup>Γ(D) *the subspace of* **<sup>Γ</sup>**<sup>D</sup> *consisting of the measures* <sup>μ</sup>: <sup>D</sup> <sup>→</sup> **<sup>Γ</sup>***.*

Since **Γ** is a Priestley space, so is **Γ**<sup>D</sup> equipped with the product order and topology. Hence, we regard <sup>M</sup>Γ(D) as an ordered topological space, whose topology and order are induced by those of **<sup>Γ</sup>**D. In fact <sup>M</sup>Γ(D) is a Priestley space:

**Proposition 4.** *For any distributive lattice* D*,* <sup>M</sup>Γ(D) *is a Priestley space.*

*Proof.* It suffices to show that <sup>M</sup>Γ(D) is a closed subspace of **<sup>Γ</sup>**D. Let

$$C\_{1,2} = \{ f \in \Gamma^D \mid f(0) = 0^\diamond \} \cap \{ f \in \Gamma^D \mid f(1) = 1^\diamond \} \cap \bigcap\_{a \le b} \{ f \in \Gamma^D \mid f(a) \le f(b) \}.$$

Note that the evaluation maps eva : **<sup>Γ</sup>**<sup>D</sup> <sup>→</sup> **<sup>Γ</sup>**, <sup>f</sup> → <sup>f</sup>(a), are continuous for every <sup>a</sup> <sup>∈</sup> <sup>D</sup>. Thus, the first set in the intersection defining <sup>C</sup><sup>1</sup>,<sup>2</sup> is closed because it is the equaliser of the evaluation map ev<sup>0</sup> and the constant map of value 0◦. Similarly, for the set {f <sup>∈</sup> **<sup>Γ</sup>**<sup>D</sup> <sup>|</sup> <sup>f</sup>(1) = 1◦}. The last one is the intersection of the sets of the form eva, evb<sup>−</sup><sup>1</sup>(≤), which are closed because <sup>≤</sup> is closed in **<sup>Γ</sup>** <sup>×</sup> **<sup>Γ</sup>**. Whence, <sup>C</sup><sup>1</sup>,<sup>2</sup> is a closed subset of **<sup>Γ</sup>**D. Moreover,

$$\mathcal{M}\_{\Gamma}(D) = \bigcap\_{a,b \in D} \{ f \in C\_{1,2} \mid f(a) \sim f(a \wedge b) \le f(a \vee b) - f(b) \}$$

$$\cap \bigcap\_{a,b \in D} \{ f \in C\_{1,2} \mid f(a) - f(a \wedge b) \ge f(a \vee b) \sim f(b) \}.$$

From semicontinuity of − and ∼ (Lemma 1) and the following well-known fact in order-topology we conclude that <sup>M</sup>Γ(D) is closed in **<sup>Γ</sup>**D.

*Fact.* Let X, Y be compact ordered spaces, f : X <sup>→</sup> Y a lower semicontinuous function and <sup>g</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> an upper semicontinuous function. If <sup>X</sup> is a closed subset of X, then so is E <sup>=</sup> {x <sup>∈</sup> X <sup>|</sup> <sup>g</sup>(x) <sup>≤</sup> <sup>f</sup>(x)}.

Next, we prove a property which is very useful when approximating a fragment of a logic by smaller fragments (see, e.g., Section 5.1). Let us denote by **DLat** the category of distributive lattices and homomorphisms, and by **Pries** the category of Priestley spaces and continuous monotone maps.

**Proposition 5.** *The assignment* <sup>D</sup> → MΓ(D) *yields a contravariant functor* M<sup>Γ</sup> : **DLat** → **Pries** *which sends directed colimits to codirected limits.*

*Proof.* If h: D <sup>→</sup> E is a lattice homomorphism and μ: E <sup>→</sup> **<sup>Γ</sup>** is a measure, it is not difficult to see that <sup>M</sup>Γ(h)(μ) = μ · h: D <sup>→</sup> **<sup>Γ</sup>** is a measure. The mapping <sup>M</sup>Γ(h): <sup>M</sup>Γ(E) → MΓ(D) is clearly monotone. For continuity, recall that the topology of <sup>M</sup>Γ(D) is generated by the sets a<q <sup>=</sup> {ν : D <sup>→</sup> **<sup>Γ</sup>** <sup>|</sup> ν(a) < q◦} and <sup>a</sup> <sup>≥</sup> <sup>q</sup> <sup>=</sup> {<sup>ν</sup> : <sup>D</sup> <sup>→</sup> **<sup>Γ</sup>** <sup>|</sup> <sup>ν</sup>(a) <sup>≥</sup> <sup>q</sup>◦}, with a <sup>∈</sup> D and q <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1]. We have

$$\mathcal{M}\_{\Gamma}(h)^{-1}(\llbracket a < q \rrbracket) = \{ \mu \colon E \to \Gamma \mid \mu(h(a)) < q^{\diamond} \} = \llbracket h(a) < q \rrbracket$$

which is open in <sup>M</sup>Γ(E). Similarly, <sup>M</sup>Γ(h)−<sup>1</sup>(a <sup>≥</sup> q) = h(a) <sup>≥</sup> q, showing that <sup>M</sup>Γ(h) is continuous. Thus, <sup>M</sup><sup>Γ</sup> is a contravariant functor.

The rest of the proof is a routine verification.

*Remark 6.* We work with the contravariant functor M<sup>Γ</sup> : **DLat** → **Pries** because M<sup>Γ</sup> is concretely defined on the lattice side. However, by Priestley duality, **DLat** is dually equivalent to **Pries**, so we can think of M<sup>Γ</sup> as a covariant functor **Pries** → **Pries** (this is the perspective traditionally adopted in analysis, and also in the works of Neˇsetˇril and Ossona de Mendez). From this viewpoint, Section 6 provides a description of the endofunctor on **DLat** dual to M<sup>Γ</sup> : **Pries** → **Pries**.

Recall the maps γ : **<sup>Γ</sup>** <sup>→</sup> [0, 1] and ι: [0, 1] <sup>→</sup> **<sup>Γ</sup>** from equations (3)–(4). In Section 3.2 we showed that this is a retraction-section pair. In Theorem 10 this retraction is lifted to the spaces of measures. We start with an easy observation:

## **Lemma 7.** *Let* D *be a distributive lattice. The following statements hold:*


*Proof.* 1. The only non-trivial condition to verify is finite additivity. In view of the discussion after Lemma 2, the map γ preserves both minus operations on **<sup>Γ</sup>**. Hence, for every a, b <sup>∈</sup> D, the inequalities μ(a) <sup>∼</sup> μ(a <sup>∧</sup> b) <sup>≤</sup> μ(a <sup>∨</sup> b) <sup>−</sup> μ(b) and μ(a)−μ(a∧b) <sup>≥</sup> μ(a∨b) <sup>∼</sup> μ(b) imply that γ·μ(a)−γ·μ(a∧b) = γ·μ(a∨b)−γ·μ(b).

2. The first two conditions in Definition 3 are immediate. The third condition follows from the fact that ι(r <sup>−</sup> s) = ι(r) <sup>−</sup> ι(s) whenever s <sup>≤</sup> r in [0,1], and x <sup>∼</sup> y <sup>≤</sup> x <sup>−</sup> y for every (x, y) <sup>∈</sup> dom(−).

In view of the previous lemma, there are well-defined functions

γ# : <sup>M</sup>Γ(D) → MI(D), μ → γ · μ and ι # : <sup>M</sup>I(D) → MΓ(D), m → ι · m.

**Lemma 8.** <sup>γ</sup># : <sup>M</sup>Γ(D) → MI(D) *is a continuous and monotone map.*

*Proof.* The topology of <sup>M</sup>I(D) is generated by the sets of the form {<sup>m</sup> <sup>∈</sup> <sup>M</sup>I(D) <sup>|</sup> m(a) <sup>∈</sup> O}, for a <sup>∈</sup> D and O an open subset of [0, 1]. In turn,

(γ#) <sup>−</sup><sup>1</sup>{m ∈ MI(D) <sup>|</sup> m(a) <sup>∈</sup> O} <sup>=</sup> {μ ∈ MΓ(D) <sup>|</sup> μ(a) <sup>∈</sup> γ<sup>−</sup><sup>1</sup>(O)}

is open in <sup>M</sup>Γ(D) because γ : **<sup>Γ</sup>** <sup>→</sup> [0, 1] is continuous by Lemma 2. This shows that γ# : <sup>M</sup>Γ(D) → MI(D) is continuous. Monotonicity is immediate.

Note that γ# : <sup>M</sup>Γ(D) → MI(D) is surjective, since it admits <sup>ι</sup> # as a (settheoretic) section. It follows that <sup>M</sup>I(D) is a compact ordered space:

**Corollary 9.** *For each distributive lattice* <sup>D</sup>*,* <sup>M</sup>I(D) *is a compact ordered space.*

*Proof.* The surjection γ# : <sup>M</sup>Γ(D) → MI(D) is continuous (Lemma 8). Since <sup>M</sup>Γ(D) is compact by Proposition 4, so is <sup>M</sup>I(D). The order of <sup>M</sup>I(D) is clearly closed in the product topology, thus <sup>M</sup>I(D) is a compact ordered space.

Finally, we see that the set-theoretic retraction of <sup>M</sup>Γ(D) onto <sup>M</sup>I(D) lifts to the topological setting, provided we restrict to the down-set topologies. If (X, <sup>≤</sup>) is a partially ordered topological space, write <sup>X</sup><sup>↓</sup> for the space with the same underlying set as X and whose topology consists of the open down-sets of X.

**Theorem 10.** *The maps* <sup>γ</sup># : <sup>M</sup>Γ(D)<sup>↓</sup> → MI(D)<sup>↓</sup> *and* ι # : <sup>M</sup>I(D)<sup>↓</sup> → MΓ(D)<sup>↓</sup> *are a retraction-section pair of topological spaces.*

*Proof.* It suffices to show that γ# and <sup>ι</sup> # are continuous. It is not difficult to see, using Lemma 8, that γ# : <sup>M</sup>Γ(D)<sup>↓</sup> → MI(D)<sup>↓</sup> is continuous. For the continuity of ι #, note that the topology of <sup>M</sup>Γ(D)<sup>↓</sup> is generated by the sets of the form {μ ∈ MΓ(D) <sup>|</sup> <sup>μ</sup>(a) <sup>≤</sup> <sup>q</sup><sup>−</sup>}, for <sup>a</sup> <sup>∈</sup> <sup>D</sup> and <sup>q</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> (0, 1]. We have

$$\begin{aligned} \left(\iota^\#\right)^{-1}\{\mu \in \mathcal{M}\_\Gamma(D) \mid \mu(a) \le q^-\} &= \{m \in \mathcal{M}\_\Gamma(D) \mid m(a) \in \iota^{-1}(\downarrow q^-)\} \\ &= \{m \in \mathcal{M}\_\Gamma(D) \mid m(a) < q\}, \end{aligned}$$

which is an open set in <sup>M</sup>I(D)↓. This concludes the proof.

## **5 The Γ-valued Stone pairing and limits of finite structures**

In the work of Neˇsetˇril and Ossona de Mendez, the Stone pairing --, A is [0, 1] valued, i.e. an element of <sup>M</sup>I(FO(σ)). In this section, we show that basically the same construction for the recognisers arising from the application of a layer of semiring quantifiers in logic on words (cf. Section 2.3) provides an embedding of finite σ-structures into the space of **<sup>Γ</sup>**-valued measures. It turns out that this embedding is a **Γ**-valued version of the Stone pairing. Hereafter we make a notational difference, writing --, -<sup>I</sup> for the (classical) [0,1]-valued Stone pairing.

The main ingredient of the construction are the **Γ**-valued finitely supported functions. To start with, we point out that the partial operation − on **Γ** uniquely determines a partial "plus" operation on **Γ**. Define

$$\{+ \colon \text{dom}(+) \to \Gamma, \quad \text{where} \quad \text{dom}(+) = \{(x, y) \mid x \le 1^\circ - y\},$$

by the following rules (whenever the expressions make sense):

<sup>r</sup>◦+s◦ = (r+s) ◦, r<sup>−</sup>+s◦ = (r+s) −, r◦+s<sup>−</sup> = (r+s) −, and r<sup>−</sup>+s<sup>−</sup> = (r+s) −.

Then, for every y <sup>∈</sup> **<sup>Γ</sup>**, the function (-) + y sending x to x <sup>+</sup> y is left adjoint to the function ( -) <sup>−</sup> y sending x to x <sup>−</sup> y.

**Definition 11.** *For any set* X*,* <sup>F</sup>(X) *is the set of all functions* f : X <sup>→</sup> **<sup>Γ</sup>** *s.t.*


 To improve readability, if the sum <sup>y</sup><sup>1</sup> <sup>+</sup> ··· <sup>+</sup> <sup>y</sup><sup>m</sup> exists in **<sup>Γ</sup>**, we denote it m i=1 <sup>y</sup>i. Finitely supported functions in the above sense always determine measures over the power-set algebra (the proof is an easy verification and is omitted):

**Lemma 12.** *Let* X *be any set. There is a well-defined mapping* : <sup>F</sup>(X) <sup>→</sup> <sup>M</sup>Γ(℘(X))*, assigning to every* f ∈ F(X) *the measure*

$$\int f \colon M \mapsto \int\_M f = \sum \{ f(x) \mid x \in M \cap \text{supp}(f) \}.$$

#### **5.1 The Γ-valued Stone pairing and logic on words**

Fix a countably infinite set of variables {v<sup>1</sup>, v<sup>2</sup>,... }. Recall that FOn(σ) is the Lindenbaum-Tarski algebra of first-order formulas with free variables among {v<sup>1</sup>,...,vn}. The dual space of FOn(σ) is the space of <sup>n</sup>-types Typn(σ). Its points are the equivalence classes of pairs (A, α), where A is a σ-structure and <sup>α</sup>: {v<sup>1</sup>,...,vn} → <sup>A</sup> is an interpretation of the variables. Write Fin(σ) for the set of all finite <sup>σ</sup>-structures and define a map Fin(σ) → F(Typn(σ)) as <sup>A</sup> → <sup>f</sup> <sup>A</sup> n , where f <sup>A</sup> n is the function which sends an equivalence class <sup>E</sup> <sup>∈</sup> Typn(σ) to

$$f\_n^A(E) = \sum\_{(A,\alpha)\in E} \left(\frac{1}{|A|^n}\right)^\odot \quad \text{(Add } \frac{1}{|A|^n} \text{ for every interpretation } \alpha \text{ of the free } \beta\text{)}$$

$$\text{where } \text{(}A\text{)} \text{ is } \text{r}. \text{ (}A,\alpha\text{) is in the equivalence class).}$$

By Lemma 12, we get a measure f <sup>A</sup> n : <sup>℘</sup>(Typn(σ)) <sup>→</sup> **<sup>Γ</sup>**. Now, for each <sup>ϕ</sup> <sup>∈</sup> FOn(σ), let <sup>ϕ</sup><sup>n</sup> <sup>⊆</sup> Typn(σ) be the set of (equivalence classes of) <sup>σ</sup>-structures with interpretations satisfying ϕ. By Stone duality we obtain an embedding --<sup>n</sup> : FOn(σ) <sup>→</sup> <sup>℘</sup>(Typn(σ)). Restricting <sup>f</sup> <sup>A</sup> n to FOn(σ), we get a measure

$$\mu\_n^A \colon \mathrm{FO}\_n(\sigma) \to \Gamma, \quad \varphi \mapsto \int\_{\left[\varphi\right]\_n} f\_n^A \, .$$

Summing up, we have the composite map

$$\text{Fin}(\sigma) \to \mathcal{M}\_{\Gamma}(\{\mathcal{O}(\text{Type}\_{n}(\sigma))\} \to \mathcal{M}\_{\Gamma}(\text{FO}\_{n}(\sigma)), \quad A \mapsto \int f\_{n}^{A} \mapsto \mu\_{n}^{A}. \tag{6}$$

Essentially the same construction is featured in logic on words, cf. equation (2):


On the other hand, the assignment A → μ<sup>A</sup> n defined in (6) is also closely related to the classical Stone pairing. Indeed, for every formula <sup>ϕ</sup> in FOn(σ),

$$\mu\_n^A(\varphi) = \sum\_{E \in \{\varphi\}\_n} f\_n^A(E) = \sum\_{E \in \{\varphi\}\_n} \sum\_{\{A, \alpha\} \in E} \left(\frac{1}{|A|^n}\right)^\diamond$$

$$= \left(\frac{|\{\overline{a} \in A^n \mid A \mid = \varphi(\overline{a})\}|}{|A|^n}\right)^\diamond = (\langle \varphi, A \rangle\_1)^\diamond. \tag{7}$$

In this sense, μ<sup>A</sup> n can be regarded as a **<sup>Γ</sup>**-valued Stone pairing, relative to the fragment FOn(σ). Next, we show how to extend this to the full first-order logic FO(σ). First, we observe that the construction is invariant under extensions of the set of free variables (the proof is the same as in the classical case).

#### **Lemma 13.** *Given* m, n <sup>∈</sup> <sup>N</sup> *and* A <sup>∈</sup> Fin(σ)*, if* m <sup>≥</sup> n *then* (μ<sup>A</sup> m)-FOn(σ) <sup>=</sup> <sup>μ</sup><sup>A</sup> n *.*

The Lindenbaum-Tarski algebra of all first-order formulas FO(σ) is the directed colimit of the Boolean subalgebras FOn(σ), for <sup>n</sup> <sup>∈</sup> <sup>N</sup>. Since the functor <sup>M</sup><sup>Γ</sup> turns directed colimits into codirected limits (Proposition 5), the Priestley space <sup>M</sup>Γ(FO(σ)) is the limit of the diagram

$$\left\{ \mathcal{M}\_{\Gamma}(\mathrm{FO}\_{n}(\sigma)) \xleftarrow{q\_{n,m}} \mathcal{M}\_{\Gamma}(\mathrm{FO}\_{m}(\sigma)) \mid m, n \in \mathbb{N}, \ m \ge n \right\}.$$

where, for any <sup>μ</sup>: FOm(σ) <sup>→</sup> **<sup>Γ</sup>** in <sup>M</sup>Γ(FOm(σ)), the measure <sup>q</sup>n,m(μ) is the restriction of <sup>μ</sup> to FOn(σ). In view of Lemma 13, for every <sup>A</sup> <sup>∈</sup> Fin(σ), the tuple (μA n )n∈<sup>N</sup> is compatible with the restriction maps. Thus, recalling that limits in the category of Priestley spaces are computed as in sets, by universality of the limit construction, this tuple yields a measure

$$\langle \text{-}, A \rangle\_{\Gamma} : \text{FO}(\sigma) \to \Gamma$$

in the space <sup>M</sup>Γ(FO(σ)). This we call the **<sup>Γ</sup>***-valued Stone pairing* associated with A. As in the classical case, it is not difficult to see that the mapping A → --, A<sup>Γ</sup> gives an embedding --, -<sup>Γ</sup> : Fin(σ) → MΓ(FO(σ)). The following theorem illustrates the relation between the classical Stone pairing --, -<sup>I</sup> : Fin(σ) <sup>→</sup> <sup>M</sup>I(FO(σ)), and the **<sup>Γ</sup>**-valued one.

**Theorem 14.** *The following diagram commutes:*

*Proof.* Fix an arbitrary finite structure A <sup>∈</sup> Fin(σ). Let ϕ be a formula in FO(σ) with free variables among {v1,...,vn}, for some <sup>n</sup> <sup>∈</sup> <sup>N</sup>. By construction, ϕ, A<sup>Γ</sup> <sup>=</sup> <sup>μ</sup><sup>A</sup> n (ϕ). Therefore, by equation (7), ϕ, A<sup>Γ</sup> = (ϕ, AI)◦. The statement then follows at once.

**Remark.** The construction in this section works also for proper fragments, i.e. for sublattices D <sup>⊆</sup> FO(σ). This corresponds to composing the embedding Fin(σ) → MΓ(FO(σ)) with the restriction map <sup>M</sup>Γ(FO(σ)) → MΓ(D) sending μ: FO(σ) <sup>→</sup> **<sup>Γ</sup>** to μ-D : <sup>D</sup> <sup>→</sup> **<sup>Γ</sup>**. The only difference is that the ensuing map Fin(σ) → MΓ(D) need not be injective, in general.

#### **5.2 Limits in the spaces of measures**

By Theorem 14 the **Γ**-valued Stone pairing --, -<sup>Γ</sup> and the classical Stone pairing --, -<sup>I</sup> determine each other. However, the notions of convergence associated with the spaces <sup>M</sup>Γ(FO(σ)) and <sup>M</sup>I(FO(σ)) are different: since the topology of <sup>M</sup>Γ(FO(σ)) is richer, there are "fewer" convergent sequences. Recall from Lemma 8 that γ# : <sup>M</sup>Γ(FO(σ)) → MI(FO(σ)) is continuous. Also, γ#(--, AΓ) = --, A<sup>I</sup> by Theorem 14. Thus, for any sequence of finite structures (An)n∈<sup>N</sup>, if


then


The converse is not true. For example, consider the signature σ <sup>=</sup> {<} consisting of a single binary relation symbol, and let (An)n∈<sup>N</sup> be the sequence of finite posets displayed in the picture below.

Let ψ(x) ≈ ∀y <sup>¬</sup>(x<y) ∧ ∃z <sup>¬</sup>(z<x) ∧ ¬(z <sup>=</sup> x) be the formula stating that x is maximal but not the maximum in the order given by <. Then, for the sublattice D <sup>=</sup> {**f**, ψ, **<sup>t</sup>**} of FO(σ), the sequences --, An<sup>Γ</sup> and --, An<sup>I</sup> converge in <sup>M</sup>Γ(D) and <sup>M</sup>I(D), respectively. However, if we consider the Boolean algebra B <sup>=</sup> {**f**, ψ, <sup>¬</sup>ψ, **<sup>t</sup>**}, then the --, AnI's still converge whereas the --, AnΓ's do not. Indeed, the following sequence does not converge in **Γ**:

$$(\langle \neg \psi, A\_n \rangle\_\Gamma)\_n = (1^\diamond, (\frac{1}{3})^\diamond, 1^\diamond, (\frac{2}{4})^\diamond, 1^\diamond, (\frac{3}{5})^\diamond, \dots),$$

because the odd terms converge to 1◦, while the even terms converge to 1−. However, there is a sequence --, Bn<sup>Γ</sup> whose image under <sup>γ</sup># coincides with the limit of the --, AnI's (e.g., take the subsequence of even terms of (An)n∈<sup>N</sup>). In the next theorem, we will see that this is a general fact.

Identify Fin(σ) with a subset of <sup>M</sup>Γ(FO(σ)) (resp. <sup>M</sup>I(FO(σ))) through --, -<sup>Γ</sup> (resp. --, -I). A central question in the theory of structural limits, cf. [16], is to determine the closure of Fin(σ) in <sup>M</sup>I(FO(σ)), which consists precisely of the limits of sequences of finite structures. The following theorem gives an answer to this question in terms of the corresponding question for <sup>M</sup>Γ(FO(σ)).

**Theorem 15.** *Let* Fin(σ) *denote the closure of* Fin(σ) *in* <sup>M</sup>Γ(FO(σ))*. Then the set* γ#(Fin(σ)) *coincides with the closure of* Fin(σ) *in* <sup>M</sup>I(FO(σ))*.*

*Proof.* Write U for the image of --, -<sup>Γ</sup> : Fin(σ) → MΓ(FO(σ)), and <sup>V</sup> for the image of --, -<sup>I</sup> : Fin(σ) → MI(FO(σ)). We must prove that <sup>γ</sup>#(U) = <sup>V</sup> . By Theorem 14, γ#(U) = <sup>V</sup> . The map <sup>γ</sup># : <sup>M</sup>Γ(FO(σ)) → MI(FO(σ)) is continuous (Lemma 8), and the spaces <sup>M</sup>Γ(FO(σ)) and <sup>M</sup>I(FO(σ)) are compact Hausdorff (Proposition 4 and Corollary 9). Since continuous maps between compact Hausdorff spaces are closed, <sup>γ</sup>#(U) = γ#(U) = V .

## **6 The logic of measures**

Let D be a distributive lattice. We know from Proposition 4 that the space <sup>M</sup>Γ(D) of **<sup>Γ</sup>**-valued measures on D is a Priestley space, whence it has a dual distributive lattice **<sup>P</sup>**(D). In this section we show that **<sup>P</sup>**(D) can be represented as the Lindenbaum-Tarski algebra for a propositional logic <sup>P</sup>LD obtained from D by adding probabilistic quantifiers. Since we adopt a logical perspective, we write **<sup>f</sup>** and **<sup>t</sup>** for the bottom and top elements of D, respectively.

The set of propositional variables of <sup>P</sup>LD consists of the symbols <sup>P</sup><sup>≥</sup>p <sup>a</sup>, for every a <sup>∈</sup> D and p <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1]. For every measure μ ∈ MΓ(D), we set

$$\mu \Vdash \mathbb{P}\_{\geq p} a \iff \mu(a) \geq p^{\diamond}. \tag{8}$$

This satisfaction relation extends in the obvious way to the closure under finite conjunctions and finite disjunctions of the set of propositional variables. Define

$$
\varphi \vdash \psi \quad \text{if,} \quad \forall \mu \in \mathcal{M}\_\Gamma(D), \quad \mu \vdash \varphi \text{ implies } \mu \vdash \psi.
$$

Also, write <sup>|</sup><sup>=</sup> <sup>ϕ</sup> if <sup>μ</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> for every <sup>μ</sup> ∈ MΓ(D), and ϕ <sup>|</sup>= if there is no μ <sup>∈</sup> <sup>M</sup>Γ(D) with μ <sup>|</sup><sup>=</sup> ϕ.

Consider the following conditions, for any p, q, r <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1] and a, b <sup>∈</sup> D.

(L1) <sup>P</sup>≥q <sup>a</sup> <sup>|</sup><sup>=</sup> <sup>P</sup>≥p <sup>a</sup> whenever <sup>p</sup> <sup>≤</sup> <sup>q</sup> (L2) <sup>P</sup>≥p **<sup>f</sup>** <sup>|</sup>= whenever p > 0, <sup>|</sup><sup>=</sup> <sup>P</sup>≥<sup>0</sup> **<sup>f</sup>** and <sup>|</sup><sup>=</sup> <sup>P</sup>≥q **<sup>t</sup>** (L3) <sup>P</sup>≥q <sup>a</sup> <sup>|</sup><sup>=</sup> <sup>P</sup>≥q <sup>b</sup> whenever <sup>a</sup> <sup>≤</sup> <sup>b</sup> (L4) <sup>P</sup>≥p <sup>a</sup> <sup>∧</sup> <sup>P</sup>≥q <sup>b</sup> <sup>|</sup><sup>=</sup> <sup>P</sup>≥p+q−r (a∨b) <sup>∨</sup> <sup>P</sup>≥r (a∧b) whenever 0 <sup>≤</sup> <sup>p</sup>+q−<sup>r</sup> <sup>≤</sup> <sup>1</sup> (L5) <sup>P</sup>≥p+q−r (a∨b) <sup>∧</sup> <sup>P</sup>≥r (a∧b) <sup>|</sup><sup>=</sup> <sup>P</sup>≥p <sup>a</sup> <sup>∨</sup> <sup>P</sup>≥q <sup>b</sup> whenever 0 <sup>≤</sup> <sup>p</sup>+q−<sup>r</sup> <sup>≤</sup> <sup>1</sup>

It is not hard to see that the interpretation in (8) validates these conditions:

**Lemma 16.** *The conditions (L1)–(L5) are satisfied in* <sup>M</sup>Γ(D)*.*

Write **<sup>P</sup>**(D) for the quotient of the free distributive lattice on the set

$$\{\mathbb{P}\_{\geq p} \: a \mid p \in \mathbb{Q} \cap [0,1], \ a \in D\}$$

with respect to the congruence generated by the conditions (L1)–(L5).

**Proposition 17.** *Let* F <sup>⊆</sup> **<sup>P</sup>**(D) *be a prime filter. The assignment*

<sup>a</sup> → {q◦ <sup>|</sup> <sup>P</sup><sup>≥</sup>q <sup>a</sup> <sup>∈</sup> <sup>F</sup>} *defines a measure* <sup>μ</sup>F : <sup>D</sup> <sup>→</sup> **<sup>Γ</sup>**.

*Proof.* Items (L2) and (L3) take care of the first two conditions defining **Γ**-valued measures (cf. Definition 3). We prove the first half of the third condition, as the other half is proved in a similar fashion. We must show that, for every a, b <sup>∈</sup> D,

$$
\mu\_F(a) \sim \mu\_F(a \land b) \le \mu\_F(a \lor b) - \mu\_F(b). \tag{9}
$$

It is not hard to show that <sup>μ</sup>F (a) <sup>−</sup> <sup>r</sup>◦ <sup>=</sup> {p◦ <sup>1</sup> <sup>−</sup> <sup>r</sup>◦ <sup>|</sup> <sup>r</sup>◦ <sup>≤</sup> <sup>p</sup>◦ <sup>1</sup> <sup>≤</sup> <sup>μ</sup><sup>F</sup> (a)}, and x <sup>−</sup> (-) transforms non-empty joins into meets (this follows by Scott continuity of <sup>x</sup> <sup>−</sup> (-) seen as a map [0◦, x] <sup>→</sup> **<sup>Γ</sup>**∂). Hence, equation (9) is equivalent to

$$\bigvee \{ p^\diamond - r^\diamond \mid \mu\_F(a \land b) < r^\diamond \le p^\diamond \le \mu\_F(a) \} \le \bigwedge \{ \mu\_F(a \lor b) - q^\diamond \mid q^\diamond \le \mu\_F(b) \}.$$

To settle this inequality it is enough to show that, provided <sup>μ</sup>F (<sup>a</sup> <sup>∧</sup> <sup>b</sup>) < r◦ <sup>≤</sup> <sup>p</sup>◦ <sup>≤</sup> <sup>μ</sup>F (a) and <sup>q</sup>◦ <sup>≤</sup> <sup>μ</sup>F (b), we have (<sup>p</sup> <sup>−</sup> <sup>r</sup>)◦ <sup>≤</sup> <sup>μ</sup>F (<sup>a</sup> <sup>∨</sup> <sup>b</sup>) <sup>−</sup> <sup>q</sup>◦. The latter inequality is equivalent to (<sup>p</sup> <sup>+</sup> <sup>q</sup> <sup>−</sup> <sup>r</sup>)◦ <sup>≤</sup> <sup>μ</sup>F (<sup>a</sup> <sup>∨</sup> <sup>b</sup>). In turn, using (L4) and the fact that <sup>F</sup> is a prime filter, <sup>P</sup><sup>≥</sup>p a, <sup>P</sup><sup>≥</sup>q <sup>b</sup> <sup>∈</sup> <sup>F</sup> and <sup>P</sup><sup>≥</sup>r (<sup>a</sup> <sup>∧</sup> <sup>b</sup>) <sup>∈</sup>/ <sup>F</sup> entail <sup>P</sup><sup>≥</sup>p+q−r (<sup>a</sup> <sup>∨</sup> <sup>b</sup>) <sup>∈</sup> <sup>F</sup>. Whence,

$$\mu\_F(a \lor b) = \bigvee \{ s^\diamond \mid \mathbb{P}\_{\geq s} \left( a \lor b \right) \in F \} \geq \left( p + q - r \right)^\diamond. \tag{7}$$

We can now describe the dual lattice of <sup>M</sup>Γ(D) as the Lindenbaum-Tarski algebra for the logic <sup>P</sup>LD, built from the propositional variables <sup>P</sup><sup>≥</sup>p <sup>a</sup> by imposing the laws (L1)–(L5).

**Theorem 18.** *Let* D *be a distributive lattice. Then the lattice* **<sup>P</sup>**(D) *is isomorphic to the distributive lattice dual to the Priestley space* <sup>M</sup>Γ(D)*.*

*Proof.* Let <sup>X</sup>**<sup>P</sup>**(D) be the space dual to **<sup>P</sup>**(D). By Proposition 17 there is a map <sup>ϑ</sup>: <sup>X</sup>**<sup>P</sup>**(D) → MΓ(D), <sup>F</sup> → <sup>μ</sup><sup>F</sup> . We claim that <sup>ϑ</sup> is an isomorphism of Priestley space. Clearly, <sup>ϑ</sup> is monotone. If <sup>μ</sup>F<sup>1</sup> (a) <sup>≤</sup> <sup>μ</sup>F<sup>2</sup> (a) for some <sup>a</sup> <sup>∈</sup> <sup>D</sup>, we have

$$\bigvee\{q^{\diamond} \mid \mathbb{P}\_{\geq q} \ a \in F\_1\} = \mu\_{F\_1}(a) \not\subseteq \mu\_{F\_2}(a) = \bigwedge\{p^- \mid \mathbb{P}\_{\geq p} \ a \notin F\_2\}.\tag{10}$$

Equation (10) implies the existence of p, q satisfying <sup>P</sup>≥q <sup>a</sup> <sup>∈</sup> <sup>F</sup>1, <sup>P</sup>≥p a /<sup>∈</sup> <sup>F</sup><sup>2</sup> and <sup>q</sup> <sup>≥</sup> <sup>p</sup>. It follows by (L1) that <sup>P</sup>≥p <sup>a</sup> <sup>∈</sup> <sup>F</sup>1. We conclude that <sup>P</sup>≥p <sup>a</sup> <sup>∈</sup> <sup>F</sup><sup>1</sup> \ <sup>F</sup>2, whence <sup>F</sup><sup>1</sup> <sup>⊆</sup> <sup>F</sup>2. This shows that <sup>ϑ</sup> is an order embedding, whence injective.

We prove that ϑ is surjective, thus a bijection. Fix a measure μ ∈ MΓ(D). It is not hard to see, using Lemma 16, that the filter <sup>F</sup>μ <sup>⊆</sup> **<sup>P</sup>**(D) generated by

$$\{\mathbb{P}\_{\geq q} \: a \mid a \in D, \ q \in \mathbb{Q} \cap [0,1], \ \mu(a) \geq q^{\diamond}\}$$

is prime. Further, <sup>ϑ</sup>(Fμ)(a) = {q◦ <sup>|</sup> <sup>P</sup>≥q <sup>a</sup> <sup>∈</sup> <sup>F</sup>μ} <sup>=</sup> {q◦ <sup>|</sup> <sup>μ</sup>(a) <sup>≥</sup> <sup>q</sup>◦} <sup>=</sup> <sup>μ</sup>(a) for every <sup>a</sup> <sup>∈</sup> <sup>D</sup>. Hence, <sup>ϑ</sup>(Fμ) = <sup>μ</sup> and <sup>ϑ</sup> is surjective.

To settle the theorem it remains to show that ϑ is continuous. Note that for a basic clopen of the form C <sup>=</sup> {μ ∈ MΓ(D) <sup>|</sup> <sup>μ</sup>(a) <sup>≥</sup> <sup>p</sup>◦} where <sup>a</sup> <sup>∈</sup> <sup>D</sup> and <sup>p</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1], the preimage <sup>ϑ</sup><sup>−</sup><sup>1</sup>(C) = {<sup>F</sup> <sup>⊆</sup> **<sup>P</sup>**(D) <sup>|</sup> <sup>μ</sup>F (a) <sup>≥</sup> <sup>p</sup>◦} is equal to

$$\{F \in X\_{\mathbf{P}(D)} \mid \bigvee \{q^\diamond \mid \mathbb{P}\_{\geq q} a \in F\} \geq p^\diamond\} = \{F \in X\_{\mathbf{P}(D)} \mid \mathbb{P}\_{\geq p} a \in F\},$$

which is a clopen of <sup>X</sup>**<sup>P</sup>**(D). Similarly, if <sup>C</sup> <sup>=</sup> {<sup>μ</sup> ∈ MΓ(D) <sup>|</sup> <sup>μ</sup>(a) <sup>≤</sup> <sup>q</sup><sup>−</sup>} for some <sup>a</sup> <sup>∈</sup> <sup>D</sup> and <sup>q</sup> <sup>∈</sup> <sup>Q</sup>∩(0, 1], by the claim above <sup>ϑ</sup><sup>−</sup><sup>1</sup>(C) = {<sup>F</sup> <sup>∈</sup> <sup>X</sup>**<sup>P</sup>**(D) <sup>|</sup> <sup>P</sup><sup>≥</sup><sup>q</sup> a /<sup>∈</sup> <sup>F</sup>}, which is again a clopen of <sup>X</sup>**<sup>P</sup>**(D).

By Theorem 18, for any distributive lattice D, the lattice of clopen up-sets of <sup>M</sup>Γ(D) is isomorphic to the Lindenbaum-Tarski algebra **<sup>P</sup>**(D) of our *positive* propositional logic <sup>P</sup>LD. Moving from the lattice of clopen up-sets to the Boolean algebra of all clopens logically corresponds to adding negation to the logic. The logic obtained this way can be presented as follows. Introduce a new propositional variable <sup>P</sup><q <sup>a</sup>, for each <sup>a</sup> <sup>∈</sup> <sup>D</sup> and <sup>q</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup>[0, 1]. For a measure <sup>μ</sup> ∈ MΓ(D), set

$$\mu \mid = \mathbb{P}\_{$$

We also add a new rule, stating that <sup>P</sup><q <sup>a</sup> is the negation of <sup>P</sup><sup>≥</sup>q <sup>a</sup>:

(L6) <sup>P</sup><q <sup>a</sup> <sup>∧</sup> <sup>P</sup><sup>≥</sup>q <sup>a</sup> <sup>|</sup>= and <sup>|</sup><sup>=</sup> <sup>P</sup><q <sup>a</sup> <sup>∨</sup> <sup>P</sup><sup>≥</sup>q <sup>a</sup>

Clearly, (L6) is satisfied in <sup>M</sup>Γ(D). Moreover, the Boolean algebra of *all* clopens of <sup>M</sup>Γ(D) is isomorphic to the quotient of the free distributive lattice on

$$\{\mathbb{P}\_{\geq p} \: a \mid p \in \mathbb{Q} \cap [0,1], \ a \in D\} \cup \{\mathbb{P}\_{$$

with respect to the congruence generated by the conditions (L1)–(L6).

*Specialising to* FO(σ)*.* Let us briefly discuss what happens when we instantiate D with the full first-order logic FO(σ). For a formula ϕ <sup>∈</sup> FO(σ) with free variables <sup>v</sup><sup>1</sup>,...,vn and a <sup>q</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1], we have two new sentences <sup>P</sup><sup>≥</sup>q <sup>ϕ</sup> and <sup>P</sup><q <sup>ϕ</sup>. For a finite σ-structure A identified with its **<sup>Γ</sup>**-valued Stone pairing --, AΓ,

$$A \vdash \mathbb{P}\_{\geq q} \varphi \quad \text{(resp. } A \vdash \mathbb{P}\_{$$

That is, <sup>P</sup><sup>≥</sup>q <sup>ϕ</sup> is true in <sup>A</sup> if a random assignment of the variables <sup>v</sup><sup>1</sup>,...,vn in <sup>A</sup> satisfies <sup>ϕ</sup> with probability at least <sup>q</sup>. Similarly for <sup>P</sup><q <sup>ϕ</sup>. If we regard <sup>P</sup><sup>≥</sup>q and P<q as probabilistic quantifiers that bind all free variables of a given formula, the Stone pairing --, -<sup>Γ</sup> : Fin → MΓ(FO(σ)) can be seen as the embedding of finite structures into the space of types for the logic <sup>P</sup>LFO(σ).

## **Conclusion**

Types are points of the dual space of a logic (viewed as a Boolean algebra). In classical first-order logic, 0-types are just the models modulo elementary equivalence. But when there are not 'enough' models, as in finite model theory, the spaces of types provide completions of the sets of models.

In [5], it was shown that for logic on words and various quantifiers we have that, given a Boolean algebra of formulas with a free variable, the space of types of the Boolean algebra generated by the formulas obtained by quantification is given by a measure space construction. Here we have shown that a suitable enrichment of first-order logic gives rise to a space of measures <sup>M</sup>Γ(FO(σ)) closely related to the space <sup>M</sup>I(FO(σ)) used in the theory of structural limits. Indeed, Theorem 14 tells us that the ensuing Stone pairings interdetermine each other. Further, the Stone pairing for <sup>M</sup>Γ(FO(σ)) is just the embedding of the finite models in the completion/compactification provided by the space of types of the enriched logic.

These results identify the logical gist of the theory of structural limits, and provide a new and interesting connection between logic on words and the theory of structural limits in finite model theory. But we also expect that it may prove a useful tool in its own right. Thus, for structural limits, it is an open problem to characterise the closure of the image of the [0, 1]-valued Stone pairing [16]. Reasoning in the **Γ**-valued setting, native to logic and where we can use duality, one would expect that this is the subspace <sup>M</sup>Γ(Th(Fin)) of <sup>M</sup>Γ(FO(σ)) given by the quotient FO(σ) - Th(Fin) onto the theory of pseudofinite structures. The purpose of such a characterisation would be to understand the points of the closure as "generalised models". Another subject that we would like to investigate is that of zero-one laws. The zero-one law for first-order logic states that the sequence of measures for which the nth measure, on a sentence ψ, yields the proportion of n-element structures satisfying ψ, converges to a {0, <sup>1</sup>}-valued measure. Over **<sup>Γ</sup>** this will no longer be true as 1 is split into its 'limiting' and 'achieved' personae. Yet, we expect the above sequence to converge also in this setting and, by Theorem 14, it will converge to a {0◦, <sup>1</sup><sup>−</sup>, <sup>1</sup>◦}-valued measure. Understanding this more fine-grained measure may yield useful information about the zero-one law.

Further, it would be interesting to investigate whether the limits for schema mappings introduced by Kolaitis *et al.* [13] may be seen also as a type-theoretic construction. Finally, we would want to explore the connections with other semantically inspired approaches to finite model theory, such as those recently put forward by Abramsky, Dawar *et al.* [2,3].

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing**

Mathieu Huot -<sup>1</sup>∗, Sam Staton1∗, and Matthijs V´ak´ar2<sup>∗</sup>

<sup>1</sup> University of Oxford, UK <sup>2</sup> Utrecht University, The Netherlands <sup>∗</sup>Equal contribution mathieu.huot@stx.ox.ac.uk

**Abstract.** We present semantic correctness proofs of Automatic Differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Finally, we sketch how the analysis extends to other AD methods by considering a continuation-based method.

## **1 Introduction**

Automatic differentiation (AD), loosely speaking, is the process of taking a program describing a function, and building the derivative of that function by applying the chain rule across the program code. As gradients play a central role in many aspects of machine learning, so too do automatic differentiation systems such as TensorFlow [1] or Stan [6].

Differentiation has a well developed mathematical theory in terms of differential geometry. The aim of this paper is to formalize this connection between differential geometry and the syntactic operations of AD. In this way we achieve two things: (1) a com-

**Fig. 1.** Overview of semantics/correctness of AD.

positional, denotational understanding of differentiable programming and AD; (2) an explanation of the correctness of AD.

This intuitive correspondence (summarized in Fig. 1) is in fact rather complicated. In this paper we focus on resolving the following problem: higher order functions play a key role in programming, and yet they have no counterpart in traditional differential geometry. Moreover, we resolve this problem while retaining the compositionality of denotational semantics.

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 319–338, 2020. https://doi.org/10.1007/978-3-030-45231-5\_17

**Higher order functions and differentiation.** A major application of higher order functions is to support disciplined code reuse. Code reuse is particularly acute in machine learning. For example, a multi-layer neural network might be built of millions of near-identical neurons, as follows. 1

neuron<sup>n</sup> : **(real**<sup>n</sup>*∗***(real**<sup>n</sup>*∗***real))** <sup>→</sup> **real** neuron<sup>n</sup> def = λx,w, b. ς(w · x + b) layer<sup>n</sup> : (**(**τ1*∗*P**)** <sup>→</sup> <sup>τ</sup>2) <sup>→</sup> **(**τ1*∗*<sup>P</sup> <sup>n</sup>**)** <sup>→</sup> <sup>τ</sup> <sup>n</sup> 2 layer<sup>n</sup> def = λf. λx,p1,...,pn.fx, p1,...,fx, pn comp : **(**(**(**τ1*∗*P**)** → τ2)*∗*(**(**τ2*∗*Q**)** → τ3)**)** → **(**τ1*∗***(**P*∗*Q**))** → τ<sup>3</sup> comp def = λf,g. λx,(p, q). gfx, p, q −5 0 5 0 0.5 x ς(x)

(Here ς(x) def = <sup>1</sup> 1+e−<sup>x</sup> is the sigmoid function, as illustrated.) We can use these functions to build a network as follows (see also Fig. 2):

complayerm(neuronk), complayern(neuronm), neuronn : **(real**k*∗*P**)** <sup>→</sup> **real**

Here P ∼= **real**<sup>p</sup> with p = (m(k+1)+n(m+1)+n+1). This program (1) describes a smooth (infinitely differentiable) function. The goal of automatic differentiation is to find its derivative.

If we β-reduce all the λ's, we end up with a very long function expression just built from the sigmoid function and linear algebra. We can then find a program for calculating its derivative by applying the chain rule. However, automatic differentiation can also be expressed without first β-reducing, in a compositional way, by explaining how higher order functions like (layer) and (comp) propagate derivatives.

**Fig. 2.** The network in (1) with k inputs and two hidden layers.

This paper is a semantic analysis of this compositional approach.

The general idea of denotational semantics is to interpret types as spaces and programs as functions between the spaces. In this paper, we propose to use diffeological spaces and smooth functions [32, 16] to this end. These satisfy the following three desiderata:


We emphasise that the most standard formulation of differential geometry, using manifolds, does not support spaces of functions. Diffeological spaces seem to us the simplest notion of space that satisfies these conditions, but there are other candidates [3, 33]. A diffeological space is in particular a set X equipped with a chosen set of curves <sup>C</sup><sup>X</sup> <sup>⊆</sup> <sup>X</sup><sup>R</sup> and a smooth map <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> must be such that if γ ∈ C<sup>X</sup> then γ; f ∈ C<sup>Y</sup> . This is remiscent of the method of logical relations.

**From smoothness to automatic derivatives at higher types.** Our denotational semantics in diffeological spaces guarantees that all definable functions are smooth. But we need more than just to know that a definable function happens to have a mathematical derivative: we need to be able to find that derivative.

In this paper we focus on a simple, forward mode automatic differentiation method, which is a macro translation on syntax (called −→D in §2). We are able to show that it is correct, using our denotational semantics.

Here there is one subtle point that is central to our development. Although differential geometry provides established derivatives for first order functions (such as neuron above), there is no canonical notion of derivative for higher order functions (such as layer and comp) in the theory of diffeological spaces (e.g. [7]). We propose a new way to resolve this, by interpreting types as triples (X, X , S) where, intuitively, X is a space of inhabitants of the type, X is a space serving as a chosen bundle of tangents over <sup>X</sup>, and <sup>S</sup> <sup>⊆</sup> <sup>X</sup><sup>R</sup> <sup>×</sup> <sup>X</sup><sup>R</sup> is a binary relation between curves, informally relating curves in X with their tangent curves in X . This new model gives a denotational semantics for automatic differentiation.

In §3 we boil this new approach down to a straightforward and elementary logical relations argument for the correctness of automatic differentiation. The approach is explained in detail in §5.

**Related work and context.** AD has a long history and has many implementations. AD was perhaps first phrased in a functional setting in [26], and there are now a number of teams working on AD in the functional setting (e.g. [34, 31, 12]), some providing efficient implementations. Although that work does not involve formal semantics, it is inspired by intuitions from differential geometry and category theory.

This paper adds to a very recent body of work on verified automatic differentiation. Much of this is concurrent with and independent from the work in this article. In the first order setting, there are recent accounts based on denotational semantics in manifolds [13] and based on synthetic differential geometry [9], as well as work making a categorical abstraction [8] and work connecting operational semantics with denotational semantics [2, 28]. Recently there has also been significant progress at higher types. The work of Brunel et al. gives formal correctness proofs for reverse-mode derivatives on computation graphs [5]. The work of Barthe et al. [4] provides a general discussion of some new syntactic logical relations arguments including one very similar to our syntactic proof of Theorem 1. We understand that the authors of [9] are working on higher types.

The differential λ-calculus [11] is related to AD, and explicit connections are made in [22, 23]. One difference is that the differential λ-calculus allows addition of terms at all types, and hence vector space models are suitable, but this appears peculiar with the variant and inductive types that we consider here.

Finally we emphasise that we have chosen the neural network (1) as our running example mainly for its simplicity. There are many other examples of AD outside the neural networks literature: AD is useful whenever derivatives need to be calculated on high dimensional spaces. This includes optimization problems more generally, where the derivative is passed to a gradient descent method (e.g. [30, 18, 29, 19, 10, 21]). Other applications of AD are in advanced integration methods, since derivatives play a role in Hamiltonian Monte Carlo [25, 14] and variational inference [20].

**Summary of contributions.** We have provided a semantic analysis of automatic differentiation. Our syntactic starting point is a well-known forward-mode AD macro on a typed higher order language (e.g. [31, 34]). We recall this in §2 for function types, and in §4 we extend it to inductive types and variants. The main contributions of this paper are as follows.


## **2 A simple forward-mode AD translation**

**Rudiments of differentiation and dual numbers.** Recall that the derivative of a function <sup>f</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup>, if it exists, is a function <sup>∇</sup><sup>f</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup> such that <sup>∇</sup>f(x0) = <sup>d</sup>f(x) <sup>d</sup><sup>x</sup> (x0) is the gradient of f at x0.

To find ∇f in a compositional way, two generalizations are reasonable:


Thus we are more generally interested in transforming a function <sup>g</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> into a function <sup>h</sup> : (R×R)<sup>n</sup> <sup>→</sup> <sup>R</sup>×<sup>R</sup> in such a way that for any <sup>f</sup><sup>1</sup> ...f<sup>n</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup>,

$$((f\_1, \nabla f\_1, \dots, f\_n, \nabla f\_n); h = ((f\_1, \dots, f\_n); g, \nabla((f\_1, \dots, f\_n); g)).\tag{2}$$

An intuition for h is often given in terms of dual numbers. The transformed function operates on pairs of numbers, (x, x ), and it is common to think of such a pair as x + x for an 'infinitesimal' . But while this is a helpful intuition, the formalization of infinitesimals can be intricate, and the development in this paper is focussed on the elementary formulation in (2).

The reader may also notice that h encodes all the partial derivatives of g. For example, if <sup>g</sup> : <sup>R</sup><sup>2</sup> <sup>→</sup> <sup>R</sup>, then with <sup>f</sup>1(x) def = x and f2(x) def = x2, by applying (2) to x<sup>1</sup> we obtain h(x1, 1, x2, 0) = (g(x1, x2), ∂g(x,x2) ∂x (x1)) and similarly h(x1, 0, x2, 1) = (g(x1, x2), ∂g(x1,x) ∂x (x2)). And conversely, if g is differentiable in each argument, then a unique h satisfying (2) can be found by taking linear combinations of partial derivatives:

$$h(x\_1, x\_1', x\_2, x\_2') = (g(x\_1, x\_2), x\_1' \cdot \frac{\partial g(x, x\_2)}{\partial x}(x\_1) + x\_2' \cdot \frac{\partial g(x\_1, x)}{\partial x}(x\_2)).$$

In summary, the idea of differentiation with dual numbers is to transform a differentiable function <sup>g</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> to a function <sup>h</sup> : <sup>R</sup>2<sup>n</sup> <sup>→</sup> <sup>R</sup><sup>2</sup> which captures <sup>g</sup> and all its partial derivatives. We packaged this up in (2) as a sort-of invariant which is useful for building derivatives of compound functions <sup>R</sup> <sup>→</sup> <sup>R</sup> in a compositional way. The idea of forward mode automatic differentiation is to perform this transformation at the source code level.

**A simple language of smooth functions.** We consider a standard higher order typed language with a first order type **real** of real numbers. The types (τ,σ) and terms (t, s) are as follows.

τ, σ, ρ ::= types | **real** real numbers | **(**τ1*∗* ... *∗*τn**)** finite product | τ → σ function t, s, r ::= terms x variable | c | t + s | t ∗ s | ς(t) operations/constants | t1,...,tn | **case** t **of** x1,...,xn → s tuples/pattern matching | λx.t | t s function abstraction/app.

The typing rules are in Figure 3. We have included a minimal set of operations for the sake of illustration, but it is not difficult to add further operations. We add some simple syntactic sugar t − u def = t + (−1) ∗ u. We intend ς to stand for the sigmoid function, ς(x) def = <sup>1</sup> 1+e−<sup>x</sup> . We further include syntactic sugar **let** x = t**in** s for (λx.s)t and λx1,...,xn.t for λx.**case** x **of** x1,...,xn → t.

**Syntactic automatic differentiation: a functorial macro.** The aim of forward mode AD is to find the dual numbers representation of a function by syntactic manipulations. For our simple language, we implement this as the following inductively defined macro −→D on both types and terms (see also [34, 31]):

$$\begin{array}{ll} \overrightarrow{\mathcal{D}}(\texttt{real}) \stackrel{\scriptstyle \text{def}}{=} \{ \texttt{real} \texttt{\*} \texttt{real} \} & \qquad \overrightarrow{\mathcal{D}}(\tau \rightarrow \sigma) \stackrel{\scriptstyle \text{def}}{=} \\ \overrightarrow{\mathcal{D}}(\{ \tau\_{1} \ast \cdots \ast \tau\_{n} \}) \stackrel{\scriptstyle \text{def}}{=} \{ \overrightarrow{\mathcal{D}}(\tau\_{1}) \ast \cdots \ast \overrightarrow{\mathcal{D}}(\tau\_{n}) \} \end{array}$$

((<sup>x</sup> : <sup>τ</sup> ) <sup>∈</sup> <sup>Γ</sup>) Γ, x : <sup>τ</sup> -

Γ -

Γ -

Γ x : τ


Γ -

t : σ → τ Γ -

Γ t s : τ

<sup>=</sup> −→D (<sup>τ</sup> ) <sup>→</sup> −→D (σ)

s : σ

**Fig. 3.** Typing rules for the simple language.

t : σ

λx : τ.t : τ → σ

−→D (x) def <sup>=</sup> <sup>x</sup> −→D (c) def = c, 0 −→D (<sup>t</sup> <sup>+</sup> <sup>s</sup>) def <sup>=</sup> **case** −→D (t) **of** x, x → **case** −→D (s) **of** y, y →x + y, x + y −→D (<sup>t</sup> <sup>∗</sup> <sup>s</sup>) def <sup>=</sup> **case** −→D (t) **of** x, x → **case** −→D (s) **of** y, y →x ∗ y, x ∗ y + x ∗ y −→D (ς(t)) def <sup>=</sup> **case** −→D (t) **of** x, x → **let** y = ς(x)**in** y, x ∗ y ∗ (1 − y) −→D (λx.t) def <sup>=</sup> λx.−→D (t) −→D (t s) def <sup>=</sup> −→D (t) −→D (s) −→D (t1,...,tn) def = −→D (t1),..., −→D (tn) −→D (**case** <sup>t</sup> **of** x1,...,xn → <sup>s</sup>) def <sup>=</sup> **case** −→D (t) **of** x1,...,xn → −→D (s)

We extend −→D to contexts: −→D ({x1:τ1, ..., xn:τn}) def = {x1: −→D (τ1), ..., xn: −→D (τn)}. This turns −→D into a well-typed, functorial macro in the following sense.

**Lemma 1 (Functorial macro).** If Γ <sup>t</sup> : <sup>τ</sup> then −→D (Γ) −→D (t) : −→D (τ ). If Γ, x : σ t : τ and Γ <sup>s</sup> : <sup>σ</sup> then −→D (Γ) −→D (t[ s /x]) = −→D (t)[ −→D (s) /x].

Example 1 (Inner products). Let us write <sup>τ</sup> <sup>n</sup> for the <sup>n</sup>-fold product **(**τ*<sup>∗</sup>* ... *<sup>∗</sup>*<sup>τ</sup> **)**. Then, given Γ t, s : **real**<sup>n</sup> we can define their inner product Γ t ·<sup>n</sup> s def = **case** t **of** z1,...,zn → **case** s **of** y1,...,yn → z<sup>1</sup> ∗ y<sup>1</sup> + ··· + z<sup>n</sup> ∗ y<sup>n</sup> : **real** To illustrate the calculation of −→D , let us expand (and <sup>β</sup>-reduce) −→D (<sup>t</sup> ·<sup>2</sup> <sup>s</sup>): **case** −→D (t) **of** z1, z2 → **case** −→D (s) **of** y1, y2 → **case** <sup>z</sup><sup>1</sup> **of** z1,1, z1,2 → **case** y<sup>1</sup> **of** y1,1, y1,2 → **case** z<sup>2</sup> **of** z2,1, z2,2 → **case** y<sup>2</sup> **of** y2,1, y2,2 → z1,<sup>1</sup> ∗ y1,<sup>1</sup> + z2,<sup>1</sup> ∗ y2,<sup>1</sup> , z1,<sup>1</sup> ∗ y1,<sup>2</sup> + z1,<sup>2</sup> ∗ y1,<sup>1</sup> + z2,<sup>1</sup> ∗ y2,<sup>2</sup> + z2,<sup>2</sup> ∗ y2,1

Example 2 (Neural networks). In our introduction (1), we provided a program in our language to build a neural network out of expressions neuron, layer, comp; this program makes use of the inner product of Ex. 1. We can similarly calculate −→D of such deep neural nets by mechanically applying the macro.

## **3 Semantics of differentiation**

Consider for a moment the first order fragment of the language in § 2, with only one type, **real**, and no λ's or pairs. This has a simple semantics in the category of cartesian spaces and smooth maps. Indeed, a term x<sup>1</sup> ...x<sup>n</sup> : **real** t : **real** has a natural reading as a function <sup>t</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> by interpreting our operation symbols by the well-known operations on <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> with the corresponding name. In fact, the functions that are definable in this first order fragment are smooth, which means that they are continuous, differentiable, and their derivatives are continuous, differentiable, and so on. Let us write **CartSp** for this category of cartesian spaces (R<sup>n</sup> for some n) and smooth functions.

The category **CartSp** has cartesian products, and so we can also interpret product types, tupling and pattern matching, giving us a useful syntax for constructing functions into and out of products of R. For example, the interpretation of (neuronn) in (1) becomes

<sup>R</sup><sup>n</sup> <sup>×</sup> <sup>R</sup><sup>n</sup> <sup>×</sup> <sup>R</sup> -·n×id<sup>R</sup> −−−−−→ <sup>R</sup> <sup>×</sup> <sup>R</sup> -+ −−→ <sup>R</sup> ς −−→ <sup>R</sup>.

where -·<sup>n</sup>, -<sup>+</sup> and <sup>ς</sup> are the usual inner product, addition and the sigmoid function on R, respectively.

Inside this category, we can straightforwardly study the first order language without λ's, and automatic differentiation. In fact, we can prove the following by plain induction on the syntax:

The interpretation of the (syntactic) forward AD −→D (t) of a first-order term t equals the usual (semantic) derivative of the interpretation of t as a smooth function.

However, as is well known, the category **CartSp** does not support function spaces. To see this, notice that we have polynomial terms

> x1,...,x<sup>d</sup> : **real** λy. d <sup>n</sup>=1 <sup>x</sup>ny<sup>n</sup> : **real** <sup>→</sup> **real**

for each <sup>d</sup>, and so if we could interpret (**real** <sup>→</sup> **real**) as a Euclidean space <sup>R</sup><sup>p</sup> then, by interpreting these polynomial expressions, we would be able to find continuous injections <sup>R</sup><sup>d</sup> <sup>→</sup> <sup>R</sup><sup>p</sup> for every <sup>d</sup>, which is topologically impossible for any p, for example as a consequence of the Borsuk-Ulam theorem (see [15], Appx. A).

This means that we cannot interpret the functions (layer) and (comp) from (1) in **CartSp**, as they are higher order functions, even though they are very useful and innocent building blocks for differential programming! Clearly, we could define neural nets such as (1) directly as smooth functions without any higher order subcomponents, though that would quickly become cumbersome for deep networks. A problematic consequence of the lack of a semantics for higher order differential programs is that we have no obvious way of establishing compositional semantic correctness of −→D for the given implementation of (1).

**Diffeological spaces.** This motivates us to turn to a more general notion of differential geometry for our semantics, based on diffeological spaces [16]. The key idea will be that a higher order function is called smooth if it sends smooth functions to smooth functions, meaning that we can never use it to build first order functions that are not smooth. For example, (comp) in (1) has this property.

**Definition 1.** A diffeological space (X,PX) consists of a set X together with, for each <sup>n</sup> and each open subset <sup>U</sup> of <sup>R</sup><sup>n</sup>, a set <sup>P</sup><sup>U</sup> <sup>X</sup> ⊆ [U → X] of functions, called plots, such that


We call a function f : X → Y between diffeological spaces smooth if, for all plots <sup>p</sup> ∈ P<sup>U</sup> <sup>X</sup>, we have that <sup>p</sup>; <sup>f</sup> ∈ P<sup>U</sup> <sup>Y</sup> . We write **Diff**(X, Y ) for the set of smooth maps from X to Y . Smooth functions compose, and so we have a category **Diff** of diffeological spaces and smooth functions.

A diffeological space is thus a set equipped with structure. Many constructions of sets carry over straightforwardly to diffeological spaces.

Example 3 (Cartesian diffeologies). Each open subset U of R<sup>n</sup> can be given the structure of a diffeological space by taking all the smooth functions V → U

as <sup>P</sup><sup>V</sup> <sup>U</sup> . It is easily seen that smooth functions from V → U in the traditional sense coincide with smooth functions in the sense of diffeological spaces. Thus diffeological spaces have a profound relationship with ordinary calculus.

In categorical terms, this gives a full embedding of **CartSp** in **Diff**.

Example 4 (Product diffeologies). Given a family (Xi)<sup>i</sup>∈<sup>I</sup> of diffeological spaces, we can equip the product <sup>i</sup>∈<sup>I</sup> <sup>X</sup><sup>i</sup> of sets with the product diffeology in which <sup>U</sup>-plots are precisely the functions of the form (pi)i∈<sup>I</sup> for <sup>p</sup><sup>i</sup> ∈ P<sup>U</sup> X<sup>i</sup> .

This gives us the categorical product in **Diff**.

Example 5 (Functional diffeology). We can equip the set **Diff**(X, Y ) of smooth functions between diffeological spaces with the functional diffeology in which Uplots consist of functions f : U → **Diff**(X, Y ) such that (u, x) → f(u)(x) is an element of **Diff**(U × X, Y ).

This specifies the categorical function object in **Diff**.

**Semantics and correctness of AD.** We can now give a denotational semantics to our language from § 2. We interpret each type <sup>τ</sup> as a set <sup>τ</sup> equipped with the relevant diffeology, by induction on the structure of types:

$$\begin{array}{ll} \left[ \mathsf{real} \right] \stackrel{\text{def}}{=} \mathbb{R} & \left[ \left( \tau\_1 \ast \ldots \ast \tau\_n \right) \right] \stackrel{\text{def}}{=} \prod\_{i=1}^n \left[ \tau\_i \right] & \left[ \tau \to \sigma \right] \stackrel{\text{def}}{=} \textbf{Diff}(\left[ \tau \right], \left[ \sigma \right]) \end{array}$$

A context <sup>Γ</sup> = (x<sup>1</sup> : <sup>τ</sup><sup>1</sup> ...x<sup>n</sup> : <sup>τ</sup>n) is interpreted as a diffeological space -<sup>Γ</sup> def <sup>=</sup> <sup>n</sup> <sup>i</sup>=1<sup>τ</sup><sup>i</sup>. Now well typed terms <sup>Γ</sup> t : τ are interpreted as smooth functions <sup>t</sup> : -<sup>Γ</sup> <sup>→</sup> <sup>τ</sup> , giving a meaning for <sup>t</sup> for every valuation of the context. This is routinely defined by induction on the structure of typing derivations. Constants c : **real** are interpreted as constant functions; and the first order operations (+, ∗, ς) are interpreted by composing with the corresponding functions, which are smooth. For example, <sup>ς</sup>(t)(ρ) def <sup>=</sup> <sup>ς</sup>(<sup>t</sup>(ρ)), where <sup>ρ</sup> <sup>∈</sup> -<sup>Γ</sup>. Variables are interpreted as <sup>x</sup><sup>i</sup>(ρ) def = ρi. The remaining constructs are interpreted as follows, and it is straightforward to show that smoothness is preserved.

$$\begin{aligned} \left[ \{ t\_1, \ldots, t\_n \} \right](\rho) & \stackrel{\text{def}}{=} (\left[ t\_1 \right](\rho), \ldots, \left[ t\_n \right](\rho)) & \left[ \lambda x \mathrel{\tau} \tau \mathrel{t} \right](\rho)(a) & \stackrel{\text{def}}{=} \left[ t \right](\rho, a) \left( a \in \left[ \tau \right] \right) \\ \left[ \mathbf{case} \, t \, \mathbf{of} \,\langle \ldots \rangle \to s \right](\rho) & \stackrel{\text{def}}{=} \left[ s \right](\rho, \left[ t \right](\rho)) & \left[ \t \, s \right](\rho) & \stackrel{\text{def}}{=} \left[ t \right](\rho)(\left[ s \right](\rho)) \end{aligned}$$

Notice that a term x<sup>1</sup> : **real**,...,x<sup>n</sup> : **real** t : **real** is interpreted as a smooth function <sup>t</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>, even if <sup>t</sup> involves higher order functions (like (1)). Moreover the macro differentiation −→D (t) is a function - −→D (t) : (<sup>R</sup> <sup>×</sup> <sup>R</sup>)<sup>n</sup> <sup>→</sup> (<sup>R</sup> <sup>×</sup> <sup>R</sup>). This enables us to state a limited version of our main correctness theorem:

**Theorem 1 (Semantic correctness of** −→D **(limited)).** For any term x<sup>1</sup> : **real**,...,x<sup>n</sup> : **real** <sup>t</sup> : **real**, the function - −→D (t) is the dual numbers representation (2) of <sup>t</sup>. In detail: for any smooth functions <sup>f</sup><sup>1</sup> ...f<sup>n</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup>,

$$((f\_1, \nabla f\_1, \dots, f\_n, \nabla f\_n); [\overrightarrow{\mathcal{D}}(t)] = \left( (f\_1 \dots f\_n); [t], \nabla((f\_1 \dots f\_n); [t]) \right).$$

(For instance, if <sup>n</sup> = 2, then - −→D (t)(x1, <sup>1</sup>, x2, 0) = (<sup>t</sup>(x1, x2), <sup>∂</sup>t(x,x2) ∂x (x1)).) Proof. We prove this by logical relations. Although the following proof is elementary, we found it by using the categorical methods in § 5.

For each type <sup>τ</sup> , we define a binary relation <sup>S</sup><sup>τ</sup> between curves in <sup>τ</sup> and curves in - −→D (<sup>τ</sup> ), i.e. <sup>S</sup><sup>τ</sup> ⊆ P<sup>R</sup> <sup>τ</sup> × P<sup>R</sup> -−→D (τ) , by induction on τ :

**–** S**real** def <sup>=</sup> {(f,(f, <sup>∇</sup>f)) <sup>|</sup> <sup>f</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup> smooth}; **–** S**(**τ*∗*σ**)** def = {((f1, g1),(f2, g2)) | (f1, f2) ∈ S<sup>τ</sup> ,(g1, g2) ∈ Sσ}; **–** S<sup>τ</sup>→<sup>σ</sup> def = {(f1, f2) | ∀(g1, g2) ∈ S<sup>τ</sup> .(x→f1(x)(g1(x)), x→f2(x)(g2(x))) ∈ Sσ}.

Then, we establish the following 'fundamental lemma':

If x1:τ1, ..., xn:τ<sup>n</sup> t : σ and, for all 1≤i≤n, y1...y<sup>m</sup> : **real** s<sup>i</sup> : τ<sup>i</sup> is such that ((f1,...,fm); <sup>s</sup><sup>i</sup>,(f1, <sup>∇</sup>f1), ..., fm, <sup>∇</sup>fm); - −→D (si)) <sup>∈</sup> <sup>S</sup><sup>τ</sup><sup>i</sup> for all smooth <sup>f</sup><sup>i</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup>, then (f1, ..., fm); t[ <sup>s</sup>1/<sup>x</sup><sup>1</sup> , ...,<sup>s</sup>n/<sup>x</sup><sup>n</sup> ],(f1, <sup>∇</sup>f1, ..., fm, <sup>∇</sup>fm); - −→D (t[ <sup>s</sup>1/<sup>x</sup><sup>1</sup> , ...,<sup>s</sup>n/<sup>x</sup><sup>n</sup> ]) is in <sup>S</sup><sup>σ</sup> for all smooth <sup>f</sup><sup>i</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup>.

This is proved routinely by induction on the typing derivation of t. The case for <sup>∗</sup> relies on the precise definition of −→D (<sup>t</sup> <sup>∗</sup> <sup>s</sup>), and similarly for +, ς.

We conclude the theorem from the fundamental lemma by considering the case where τ<sup>i</sup> = σ = **real**, m = n and s<sup>i</sup> = yi.

## **4 Extending the language: variant and inductive types**

In this section, we show that the definition of forward AD and the semantics generalize if we extend the language of §2 with variants and inductive types. As an example of inductive types, we consider lists. This specific choice is only for expository purposes and the whole development works at the level of generality of arbitrary algebraic data types generated as initial algebras of (polynomial) type constructors formed by finite products and variants.

Similarly, our choice of operations is for expository purposes. More generally, assume given a family of operations (Opn)<sup>n</sup>∈<sup>N</sup> indexed by their arity n. Further assume that each op <sup>∈</sup> Op<sup>n</sup> has type **real**<sup>n</sup> <sup>→</sup> **real**. We then ask for a certain closure of these operations under differentiation, that is we define

$$\begin{array}{c} \overrightarrow{\mathcal{D}}(\mathsf{op}(t\_1, \ldots, t\_n)) \stackrel{\text{def}}{=} \mathsf{case} \, \overrightarrow{\mathcal{D}}(t\_1) \, \mathsf{of} \, \langle x\_1, x\_1' \rangle \to \ldots \to \mathsf{case} \, \overrightarrow{\mathcal{D}}(t\_n) \, \mathsf{of} \, \langle x\_n, x\_n' \rangle \to \ldots \, \mathsf{of} \\ \qquad \qquad \langle \mathsf{op}(x\_1, \ldots, x\_n), \sum\_{i=1}^n x\_i' \ast \partial\_i \mathsf{op}(x\_1, \ldots, x\_n) \rangle \end{array}$$

where ∂iop(x1,...,xn) is some chosen term in the language, involving free variables from x1,...,xn, which we think of as implementing the partial derivative of op with respect to its i-th argument. For constructing the semantics, every op must be interpreted by some smooth function, and, to establish correctness, the semantics of ∂iop(x1,...,xn) must be the semantic i-th partial derivative of the semantics of op(x1,...,xn).

**Language.** We additionally consider the following types and terms: τ, σ, ρ ::= types | {<sup>1</sup> τ<sup>1</sup> ... <sup>n</sup> <sup>τ</sup>n} variant | **list**(τ ) list

$$\begin{array}{lcl} t,s,r & ::= & \mathsf{t} \\ & \mid & \tau.\ell \, t \\ & \mid & [] \mid \, t \mathrel{\mathop{\mathsf{L}}} \, s \\ & \mid & \mathsf{case}\, t \, \mathsf{of} \, \{\ell\_1 \, x\_1 \to s\_1 \, | \, \cdots \mid \ell\_n \, x\_n \to s\_n\} \\ & \mid & \mathsf{fold}\, (x\_1, x\_2).t \, \mathsf{over}\, s \, \mathsf{from}\, r \end{array} \qquad \begin{array}{l} \mathsf{terms} \\ \mathsf{variant} \, \mathsf{constraint} \\ \mathsf{empty} \, \mathsf{list} \, \mathsf{and} \, \mathsf{cons} \\ \mathsf{apply} \, \mathsf{list} \, \mathsf{and} \, s \, \mathsf{from} \\ \mathsf{platform} \, \mathsf{match} \, s \, \mathsf{in} \, s \, \mathsf{for} \\ \mathsf{list} \, \mathsf{fold} \, \mathsf{end} \end{array}$$

We extend the type system according to:

$$\begin{array}{c} \frac{\Gamma \vdash t : \tau\_{i} \quad \left( \left( \ell\_{i} \tau\_{i} \right) \in \tau \right) \quad \frac{\Gamma \vdash t : [] \land \mathsf{list}(\tau) \quad \frac{\Gamma \vdash t : \tau \quad \Gamma \vdash s : \mathsf{list}(\tau)}{\Gamma \vdash t : \mathsf{is} : \mathsf{list}(\tau)} \right)}{\Gamma \vdash t : \left\{ \ell\_{1} \tau\_{1} \mid \ldots \mid \left\{ \mathsf{n}\_{n} \tau\_{n} \right\} \quad \text{for each } 1 \leq i \leq n : \Gamma, x\_{i} : \tau\_{i} \vdash s\_{i} : \tau} \right. \\ \frac{\Gamma \vdash \mathsf{case} \, t \,\mathsf{of} \left\{ \ell\_{1} x\_{1} \to s\_{1} \; \middle| \; \begin{array}{l} \left( \ell\_{n} x\_{n} \to s\_{n} \right) \cdot \top \quad \ell\_{n} x\_{n} \to s\_{n} \right) : \tau \\ \Gamma \vdash s : \mathsf{list}(\tau) \quad \Gamma \vdash r : \sigma \quad \Gamma, x\_{1} : \tau, x\_{2} : \sigma \vdash t : \sigma \\ \Gamma \vdash \mathsf{fold} \left( x\_{1}, x\_{2} \right) . t \,\mathsf{over} \, s \, \mathsf{from} \, r : \sigma \end{array}}{\Gamma \vdash \mathsf{fold} \left( x\_{1}, x\_{2} \right) . t \,\mathsf{over} \, s \, \mathsf{from} \, r : \sigma} \end{array}$$

We can then extend −→D to our new types and terms by −→D ({<sup>1</sup> <sup>τ</sup><sup>1</sup> ... <sup>n</sup> <sup>τ</sup>n}) def = {<sup>1</sup> −→D (τ1) ... n −→D (τn)} −→D (**list**(<sup>τ</sup> )) def = **list**( −→D (τ )) −→D (τ. t) def <sup>=</sup> −→D (<sup>τ</sup> ). −→D (t) −→D ([ ]) def = [] −→D (<sup>t</sup> :: <sup>s</sup>) def = −→D (t) :: −→D (s) −→D (**case** <sup>t</sup> **of** {<sup>1</sup> <sup>x</sup><sup>1</sup> <sup>→</sup> <sup>s</sup><sup>1</sup> ··· <sup>n</sup> <sup>x</sup><sup>n</sup> <sup>→</sup> <sup>s</sup>n}) def = **case** −→D (t) **of** {<sup>1</sup> <sup>x</sup><sup>1</sup> <sup>→</sup> −→D (s1) ··· <sup>n</sup> <sup>x</sup><sup>n</sup> <sup>→</sup> −→D (sn)} −→D (**fold** (x1, x2).t **over** <sup>s</sup> **from** <sup>r</sup>) def = **fold** (x1, x2). −→D (t) **over** −→D (s)**from** −→D (r)

To demonstrate the practical use of expressive type systems for differential programming, we consider the following two examples.

Example 6 (Lists of inputs for neural nets). Usually, we run a neural network on a large data set, the size of which might be determined at runtime. To evaluate a neural network on multiple inputs, in practice, one often sums the outcomes. This can be coded in our extended language as follows. Suppose that we have a network <sup>f</sup> : **(real**<sup>n</sup>*∗*P**)** <sup>→</sup> **real** that operates on single input vectors. We can construct one that operates on lists of inputs as follows:

$$g \stackrel{\text{def}}{=} \lambda \langle l, w \rangle . \mathbf{fold} \,(x\_1, x\_2). f \langle x\_1, w \rangle + x\_2 \,\mathsf{over} \, l \,\mathsf{from} \, \underline{0}: \, (\mathtt{list}(\mathbf{real}^n) \* P) \to \mathtt{real} \,\mathsf{end}$$

Example 7 (Missing data). In practically every application of statistics and machine learning, we face the problem of missing data: for some observations, only partial information is available. In an expressive typed programming language like we consider, we can model missing data conveniently using the data type **maybe**(<sup>τ</sup> ) = {Nothing **( )** Just <sup>τ</sup>}. In the context of a neural network, one might use it as follows. First, define some helper functions

fromMaybe def = λx.λm.**case** m **of** {Nothing → x Just <sup>x</sup> <sup>→</sup> <sup>x</sup> } fromMaybe<sup>n</sup> def = λx1, ..., xn.λm1, ..., mn.fromMaybe x<sup>1</sup> m1, ..., fromMaybe x<sup>n</sup> mn : (**maybe**(**real**))<sup>n</sup> <sup>→</sup> **real**<sup>n</sup> <sup>→</sup> **real**<sup>n</sup>

$$\text{map} \stackrel{\text{def}}{=} \lambda f. \lambda l. \mathbf{fold} \left( x\_1, x\_2 \right). f \, x\_1 :: x\_2 \, \mathbf{over} \, l \, \mathbf{from} \left[ \right] : (\tau \to \sigma) \to \mathbf{list}(\tau) \to \mathbf{list}(\sigma)$$

Given a neural network <sup>f</sup> : **(list**(**real**<sup>k</sup>)*∗*P**)** <sup>→</sup> **real**, we can build a new one that operates on on a data set for which some covariates (features) are missing, by passing in default values to replace the missing covariates:

$$\begin{aligned} \langle \lambda \langle l, \langle m, w \rangle \rangle, f \langle \text{map (fromMaybe}^k m \rangle l, w \rangle\\ \therefore \langle \text{list}((\text{maybe}(\text{real}))^k) \ast (\text{real}^k \ast P) \rangle \to \text{real} \end{aligned}$$

Then, given a data set l with missing covariates, we can perform automatic differentiation on this network to optimize, simultaneously, the ordinary network parameters w and the default values for missing covariates m.

**Semantics.** In § 3 we gave a denotational semantics for the simple language in diffeological spaces. This extends to the language in this section, as follows. As before, each type τ is interpreted as a diffeological space, which is a set equipped with a family of plots:

**–** A variant type {<sup>1</sup> τ<sup>1</sup> ... <sup>n</sup> <sup>τ</sup>n} is inductively interpreted as the disjoint union -{<sup>1</sup> τ<sup>1</sup> ··· <sup>n</sup> <sup>τ</sup>n} def = <sup>n</sup> <sup>i</sup>=1<sup>τ</sup><sup>i</sup> with <sup>U</sup>-plots PU -{<sup>1</sup> τ<sup>1</sup> ... <sup>n</sup> <sup>τ</sup>n} def = Uj fj −→ <sup>τ</sup><sup>j</sup> <sup>→</sup> <sup>n</sup> <sup>i</sup>=1τi n j=1 <sup>U</sup> <sup>=</sup> <sup>n</sup> <sup>j</sup>=1 <sup>U</sup><sup>j</sup> , f<sup>j</sup> ∈ P<sup>U</sup><sup>j</sup> τ<sup>j</sup> 

.

**–** A list type **list**(<sup>τ</sup> ) is interpreted as the set of lists, **list**(<sup>τ</sup> ) def = <sup>∞</sup> <sup>i</sup>=1τ i with U-plots

$$\mathcal{P}^{U}\_{\{\mathsf{list}(\tau)\}} \stackrel{\mathrm{def}}{=} \left\{ \left[ U\_{j} \stackrel{f\_{j}}{\to} \|\mathsf{r}\|^{j} \to \mathsf{|}\mathsf{t}\|\_{i=1}^{\infty} \|\mathsf{r}\|^{i} \right]\_{j=1}^{\infty} \; \middle| \; U = \mathsf{|}\mathsf{s}\|\_{j=1}^{\infty} U\_{j}, \; f\_{j} \in \mathcal{P}^{U\_{j}}\_{\{\mathsf{r}\}^{j}} \right\}.$$

The constructors and destructors for variants and lists are interpreted as in the usual set theoretic semantics. It is routine to show inductively that these interpretations are smooth. Thus every term Γ t : τ in the extended language is interpreted as a smooth function <sup>t</sup> : -<sup>Γ</sup> <sup>→</sup> <sup>τ</sup> between diffeological spaces.

(In this section we focused on a language with lists, but other inductive types are easily interpreted in the category of diffeological spaces in much the same way; the categorically minded reader may regard this as a consequence of **Diff** being a concrete Grothendieck quasitopos, e.g. [3].)

## **5 Categorical analysis of forward AD and its correctness**

This section has three parts. First, we give a categorical account of the functoriality of AD (Ex. 8). Then we introduce our gluing construction, and relate it to the correctness of AD (dgm. (3)). Finally, we state and prove a correctness theorem for all first order types by considering a category of manifolds (Th. 2).

**Syntactic categories.** Our language induces a syntactic category as follows.

**Definition 2.** Let **Syn** be the category whose objects are types, and where a morphism τ → σ is a term in context x : τ t : σ modulo the βη-laws (Fig. 4). Composition is by substitution.

For simplicity, we do not impose arithmetic identities such as x + y = y + x in **Syn**. As is standard, this category has the following universal property.

**Lemma 2 (e.g. [27]).** For every bicartesian closed category C with list objects, and every object F(**real**) ∈ C and morphisms F(c) ∈ C(1, F(**real**)), F(+), F(∗) ∈ C(F(**real**) × F(**real**), F(**real**)), F(ς) ∈ **Syn**(F(**real**), F(**real**)) in C, there is a unique functor F : **Syn** → C respecting the interpretation and preserving the bicartesian closed structure as well as list objects.

Proof (notes). The functor F : **Syn** → C is a canonical denotational semantics for the language, interpreting types as objects of C and terms as morphisms. For instance, <sup>F</sup>(<sup>τ</sup> <sup>→</sup> <sup>σ</sup>) def = (F τ <sup>→</sup> F σ), the function space in the category <sup>C</sup>, and <sup>F</sup>(t s) def = is the composite (F t, F s); eval. When <sup>C</sup> <sup>=</sup> **Diff**, the denotational semantics of the language in diffeological spaces (§3,4) can be understood as the unique structure preserving functor -<sup>−</sup> : **Syn** <sup>→</sup> **Diff** satisfying **real** <sup>=</sup> <sup>R</sup>, <sup>ς</sup> <sup>=</sup> <sup>ς</sup> and so on.

Example 8 (Canonical definition forward AD). The forward AD macro −→D (§2,4) arises as a canonical cartesian closed functor on **Syn**. Consider the unique cartesian closed functor <sup>F</sup> : **Syn** <sup>→</sup> **Syn** such that <sup>F</sup>(**real**) = **real**∗**real**, <sup>F</sup>(c) = −→D (c), F(ς) = −→D (ς(x)), and

F(+) = z : F(**real**)*∗*F(**real**) **case** <sup>z</sup> **of** x, y → −→D (<sup>x</sup> <sup>+</sup> <sup>y</sup>) : <sup>F</sup>(**real**) F(∗) = z : F(**real**)*∗*F(**real**) **case** <sup>z</sup> **of** x, y → −→D (<sup>x</sup> <sup>∗</sup> <sup>y</sup>) : <sup>F</sup>(**real**) Then for any type <sup>τ</sup> , <sup>F</sup>(<sup>τ</sup> ) = −→D (<sup>τ</sup> ), and for any term <sup>x</sup> : <sup>τ</sup> t : σ, F(t) = −→D (t) as morphisms F(τ ) → F(σ) in the syntactic category.

**Categorical gluing and logical relations.** Gluing is a method for building new categorical models which has been used for many purposes, including logical relations and realizability [24]. Our logical relations argument in the proof of Th. 1 can be understood in this setting. In this subsection, for the categorically minded, we explain this, and in doing so we quickly recover a correctness result for the more general language in § 4 and for arbitrary first order types.

We define a category **Gl**<sup>U</sup> whose objects are triples (X, X , S) where X and <sup>X</sup> are diffeological spaces and <sup>S</sup> ⊆ P<sup>U</sup> <sup>X</sup> × P<sup>U</sup> X is a relation between their U-plots. A morphism (X, X , S) → (Y,Y , T) is a pair of smooth functions

**case** t1,...,tn **of** x1,...,xn → s = s[ t1 /<sup>x</sup><sup>1</sup> ,...,<sup>t</sup><sup>n</sup> /<sup>x</sup><sup>n</sup> ] s[ t /y] #x1,...,x<sup>n</sup> <sup>=</sup> **case** <sup>t</sup> **of** x1,...,xn → <sup>s</sup>[ x1,...,xn /y] **case** <sup>i</sup> t **of** {<sup>1</sup> x<sup>1</sup> → s<sup>1</sup> - - ··· - <sup>n</sup> x<sup>n</sup> → sn} = si[ t /<sup>x</sup><sup>i</sup> ] s[ t /y] #x1,...,x<sup>n</sup> <sup>=</sup> **case** <sup>t</sup> **of** {<sup>1</sup> <sup>x</sup><sup>1</sup> <sup>→</sup> <sup>s</sup>[ -1 x1 /y] - - ··· - <sup>n</sup> x<sup>n</sup> → s[ <sup>n</sup> <sup>x</sup><sup>n</sup>/y]} **fold** (x1, x2).t **over** [ ]**from** r = r **fold** (x1, x2).t **over** s<sup>1</sup> :: s<sup>2</sup> **from** r = t[ s1 /<sup>x</sup><sup>1</sup> , **fold** (x1,x2).t **over** s<sup>2</sup> **from** r /<sup>x</sup><sup>2</sup> ] u = s[ [ ]/y], r[ s /<sup>x</sup><sup>2</sup> ] = s[ x1::y /y] ⇒ s[ t /y] #x1,x<sup>2</sup> = **fold** (x1, x2).r **over** t**from** u (λx.t) s = t[ s /x] t #<sup>x</sup> = λx.t x We write #x1,...,x<sup>n</sup> = to indicate that the variables are free in the left hand side.

**Fig. 4.** Standard βη-laws (e.g. [27]) for products, functions, variants and lists.

f : X → Y , f : X → Y , such that if (g, g ) ∈ S then (g; f,g ; f ) ∈ T. The idea is that this is a semantic domain where we can simultaneously interpret the language and its automatic derivatives.

**Proposition 1.** The category **Gl**<sup>U</sup> is bicartesian closed, has list objects, and the projection functor proj : **Gl**<sup>U</sup> → **Diff** × **Diff** preserves this structure.

Proof (notes). The category **Gl**<sup>U</sup> is a full subcategory of the comma category id**Set** ↓ **Diff**(U, −) × **Diff**(U, −). The result thus follows by the general theory of categorical gluing (e.g. [17, Lemma 15]).

We give a semantics − = (−<sup>0</sup>,−<sup>1</sup>, S−) for the language in **Gl**R, interpreting types <sup>τ</sup> as objects (<sup>τ</sup> <sup>0</sup>,<sup>τ</sup> <sup>1</sup>, S<sup>τ</sup> ), and terms as morphisms. We let **real**<sup>0</sup> def = R and **real**<sup>1</sup> def = R<sup>2</sup>, with the relation S**real** def <sup>=</sup> {(f,(f, <sup>∇</sup>f)) <sup>|</sup> <sup>f</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup> smooth}. We interpret the constants <sup>c</sup> as pairs c<sup>0</sup> def <sup>=</sup> <sup>c</sup> and c<sup>1</sup> def = (c, 0), and we interpret <sup>+</sup>, <sup>×</sup>, ς in the standard way (meaning, like -<sup>−</sup>) in −<sup>0</sup>, but according to the derivatives in −<sup>1</sup>, for instance, ∗<sup>1</sup> : <sup>R</sup><sup>2</sup> <sup>×</sup> <sup>R</sup><sup>2</sup> <sup>→</sup> <sup>R</sup><sup>2</sup> is

$$\left(\mathbb{A}\upharpoonright\_1((x,x'),(y,y'))\stackrel{\text{def}}{=}(xy,xy'+x'y).\right)$$

At this point one checks that these interpretations are indeed morphisms in **Gl**R. This amounts to checking that these interpretations are dual numbers representations in the sense of (2). The remaining constructions of the language are interpreted using the categorical structure of **Gl**R, following Lem. 2.

Notice that the diagram below commutes. One can check this by hand or note that it follows from the initiality of **Syn** (Lem. 2): all the functors preserve all the structure.

$$\mathop{\mathbf{Sym}}\_{\mathbb{I}\dashrightarrow\mathbb{I}} \xrightarrow{(\mathrm{id},\overrightarrow{\mathcal{D}}^{\mathsf{I}}(-))} \mathop{\mathbf{Sym}}\_{\mathbb{I}} \times \mathop{\mathbf{Sym}}\_{\mathbb{I}\dashrightarrow\mathbb{I}} \tag{3}$$

We thus arrive at a restatement of the correctness theorem (Th. 1), which holds even for the extended language with variants and lists, because for any x1...x<sup>n</sup> : **real** <sup>t</sup> : **real**, the interpretations (t, - −→D (t)) are in the image of the projection **Gl**<sup>R</sup> <sup>→</sup> **Diff** <sup>×</sup> **Diff**, and hence - −→D (t) is a dual numbers encoding of t.

**Correctness at all first order types, via manifolds.** We now generalize Theorem 1 to hold at all first order types, not just the reals. To do this, we need to define the derivative of a smooth map between the interpretations of first order types. We do this by recalling the well known theory of manifolds and tangent bundles.

For our purposes, a smooth manifold M is a second-countable Hausdorff topological space together with a smooth atlas: an open cover U together with homeomorphisms <sup>φ</sup><sup>U</sup> : <sup>U</sup> <sup>→</sup> <sup>R</sup><sup>n</sup>(U) <sup>U</sup>∈U (called charts) such that <sup>φ</sup>−<sup>1</sup> <sup>U</sup> ; φ<sup>V</sup> is smooth on its domain of definition for all U, V ∈ U. A function f : M → N between manifolds is smooth if φ−<sup>1</sup> <sup>U</sup> ; f; ψ<sup>V</sup> is smooth for all charts φ<sup>U</sup> and ψ<sup>V</sup> of M and N, respectively. Let us write **Man** for this category.

Our manifolds are slightly unusual because different charts in an atlas may have different finite dimension n(U). Thus we consider manifolds with dimensions that are potentially unbounded, albeit locally finite. This does not affect the theory of differential geometry as far as we need it here.

Each open subset of R<sup>n</sup> can be regarded as a manifold. This lets us regard the category of manifolds **Man** as a full subcategory of the category of diffeological spaces. We consider a manifold (X, {φ<sup>U</sup> }<sup>U</sup> ) as a diffeological space with the same carrier set <sup>X</sup> and where the plots <sup>P</sup><sup>U</sup> <sup>X</sup> are the smooth functions in **Man**(U, X). A function X → Y is smooth in the sense of manifolds if and only if it is smooth in the sense of diffeological spaces [16]. For the categorically minded reader, this means that we have a full embedding of **Man** into **Diff**. Moreover, the natural interpretation of the first order fragment of our language in **Man** coincides with that in **Diff**. That is, the embedding of **Man** into **Diff** preserves finite products and countable coproducts (hence initial algebras of polynomial endofunctors).

**Proposition 2.** Suppose that a type τ is first order, i.e. it is just built from reals, products, variants, and lists (or, again, arbitrary inductive types), and not function types. Then the diffeological space <sup>τ</sup> is a manifold.

Proof (notes). This is proved by induction on the structure of types. In fact, one may show that every such <sup>τ</sup> is isomorphic to a manifold of the form <sup>n</sup> <sup>i</sup>=1 <sup>R</sup>d<sup>i</sup> where the bound n is either finite or ∞, but this isomorphism is typically not an identity function.

The constraint to first order types is necessary because, e.g. the space **real** → **real** is not a manifold, because of a Borsuk-Ulam argument (see [15], Appx. A).

We recall that the derivative of any morphism f : M → N of manifolds is given as follows. For each point x in a manifold M, define the tangent space <sup>T</sup>x<sup>M</sup> to be the set {<sup>γ</sup> <sup>∈</sup> **Man**(R, M) <sup>|</sup> <sup>γ</sup>(0) = <sup>x</sup>}/ <sup>∼</sup> of equivalence classes [γ] of smooth curves γ in M based at x, where we identify γ<sup>1</sup> ∼ γ<sup>2</sup> iff ∇(γ1; f)(0) = <sup>∇</sup>(γ2; <sup>f</sup>)(0) for all smooth <sup>f</sup> : <sup>M</sup> <sup>→</sup> <sup>R</sup>. The tangent bundle of <sup>M</sup> is the set <sup>T</sup> (M) def = <sup>x</sup>∈<sup>M</sup> <sup>T</sup>x(M). The charts of <sup>M</sup> equip <sup>T</sup> (M) with a canonical manifold structure. Then for smooth f : M → N, the derivative T (f) : T (M) → T (N) is defined as <sup>T</sup> (f)(x, [γ]) def = (f(x), [γ; <sup>f</sup>]). All told, the derivative is a functor T : **Man** → **Man**.

As is standard, we can understand the tangent bundle of a composite space in terms of that of its parts.

**Lemma 3.** There are canonical isomorphisms T ( <sup>∞</sup> <sup>i</sup>=1 Mi) ∼= <sup>∞</sup> <sup>i</sup>=1 T (Mi) and T (M<sup>1</sup> × ... × Mn) ∼= T (M1) × ... × T (Mn).

We define a canonical isomorphism φ −→D <sup>T</sup> <sup>τ</sup> : - −→D (<sup>τ</sup> ) → T (<sup>τ</sup> ) for every type <sup>τ</sup> , by induction on the structure of types. We let φ −→D T **real** : - −→D (**real**) → T (**real**) be given by φ −→D T **real**(x, x ) def = (x, [<sup>t</sup> <sup>→</sup> <sup>x</sup> <sup>+</sup> <sup>x</sup> t]). For the other types, we use Lemma 3. We can now phrase correctness at all first order types.

**Theorem 2 (Semantic correctness of** −→D **(full)).** For any ground τ , any first order context Γ and any term Γ t : τ , the syntactic translation −→D coincides with the tangent bundle functor, modulo these canonical isomorphisms:

Proof (notes). For any curve <sup>γ</sup> <sup>∈</sup> **Man**(R, M), let ¯<sup>γ</sup> <sup>∈</sup> **Man**(R, <sup>T</sup> (M)) be the tangent curve, given by ¯γ(x)=(γ(x), [t → γ(x + t)]). First, we note that a smooth map h : T (M) → T (N) is of the form T (g) for some g : M → N if for all smooth curves <sup>γ</sup> : <sup>R</sup> <sup>→</sup> <sup>M</sup> we have ¯γ; <sup>h</sup> <sup>=</sup> (γ; <sup>g</sup>) : <sup>R</sup> → T (N). This generalizes (2). Second, for any first order type τ , S<sup>τ</sup> <sup>=</sup> {(f, ˜f) <sup>|</sup> ˜f; <sup>φ</sup> −→D <sup>T</sup> <sup>τ</sup> <sup>=</sup> ¯f}. This is shown by induction on the structure of types. We conclude the theorem from diagram (3), by putting these two observations together.

## **6 A continuation-based AD algorithm**

We now illustrate the flexibility of our framework by briefly describing an alternative syntactic translation ←−Dρ. This alternative translation uses aspects of continuation passing style, inspired by recent developments in reverse mode AD [34, 5]. In brief, ←−D<sup>ρ</sup> works by ←−Dρ(**real**)=(**real**∗(**real** <sup>→</sup> <sup>ρ</sup>)). Thus instead of using a pair of a number and its tangent, we use a pair of a number and a continuation. The answer type ρ = **real**<sup>k</sup> needs to have the structure of a vector space, and the continuations that we consider will turn out to be linear maps. Because we work in continuation passing style, the chain rule is applied contravariantly. If the reader is familiar with reverse-mode AD algorithms, they may think of the dimension k as the number of memory cells used to store the result.

Computing the whole gradient of a term x<sup>1</sup> : **real**, ..., x<sup>k</sup> : **real** t : **real** at once is then achieved by running ←−Dk(t) on a <sup>k</sup>-tuple of basis vectors for **real**<sup>k</sup>.

We define the continuation-based AD macro ←−D<sup>k</sup> on types and terms as the unique structure preserving functor **Syn** <sup>→</sup> **Syn** with ←−Dk(**real**) = **(real***∗*(**real** <sup>→</sup> **real**<sup>k</sup>)**)** and

$$\begin{array}{lcl}\left<\widetilde{\boldsymbol{\nabla}}\_{k}(\underline{\mathcal{L}}) \stackrel{\text{def}}{=} \left<\mathcal{L}\boldsymbol{\omega}.\langle\underline{\boldsymbol{\Omega}},\ldots,\underline{\boldsymbol{\Omega}}\rangle\right> \\\left<\widetilde{\boldsymbol{\nabla}}\_{k}(\boldsymbol{t}+\boldsymbol{s})\stackrel{\text{def}}{=} \mathsf{case}\,\widetilde{\boldsymbol{\nabla}}\_{k}(\boldsymbol{t})\,\mathbf{o}\,\langle\boldsymbol{x},\boldsymbol{x}\rangle\rightarrow\mathsf{case}\,\widetilde{\boldsymbol{\nabla}}\_{k}(\boldsymbol{s})\,\mathbf{o}\,\langle\boldsymbol{y},\boldsymbol{y}\rangle\rightarrow\langle\boldsymbol{x}+\boldsymbol{y},\lambda\boldsymbol{z}.\boldsymbol{x}'\boldsymbol{z}+\boldsymbol{y}'\boldsymbol{z}\rangle\\\left<\widetilde{\boldsymbol{\nabla}}\_{k}(\boldsymbol{t}\ast\boldsymbol{s})\stackrel{\text{def}}{=} \mathsf{case}\,\widetilde{\boldsymbol{\nabla}}\_{k}(\boldsymbol{t})\,\mathbf{o}\,\langle\boldsymbol{x},\boldsymbol{x}\rangle\rightarrow\mathsf{case}\,\widetilde{\boldsymbol{\nabla}}\_{k}(\boldsymbol{s})\,\mathbf{of}\,\langle\boldsymbol{y},\boldsymbol{y}\rangle\rightarrow\\\left<\boldsymbol{x}\ast\boldsymbol{y},\lambda\boldsymbol{z}.\boldsymbol{x}'\,\langle\boldsymbol{y}\ast\boldsymbol{z}\rangle+\boldsymbol{y}'\langle\boldsymbol{x}\ast\boldsymbol{z}\rangle\right\rangle\\\left<\widetilde{\boldsymbol{\nabla}}\_{k}(\boldsymbol{\varsigma}(\boldsymbol{t}))\stackrel{\text{def}}{=} \mathsf{case}\,\widetilde{\boldsymbol{\nabla}}\_{k}(\boldsymbol{t})\,\mathbf{of}\,\langle\boldsymbol{x},\boldsymbol{x}\rangle\rightarrow\mathsf{let}\,\boldsymbol{y}=\boldsymbol{\varsigma}(\boldsymbol{x})\,\mathbf{in}\,\langle\boldsymbol{y},\boldsymbol{\lambda}.\boldsymbol{x}'\,\langle\boldsymbol{y}\ast\boldsymbol{(1}-\boldsymbol{y})\circ\boldsymbol{z}\rangle\right\rangle.\\\end{array}$$

**case** y **of** y1,...,yk→x<sup>1</sup> +y1,...,x<sup>k</sup> +yk. (We could easily expand this definition by making ←−D<sup>k</sup> preserve all other term and type formers, as we did for −→D .) Note that the corresponding scheme for an arbitrary n-ary operation op would be (c.f. the scheme for forward AD in §4)

$$\begin{array}{c} \langle \stackrel{\leftarrow}{\mathcal{D}}\_{k}(\mathsf{op}(t\_{1},\ldots,t\_{n})) \stackrel{\mbox{def}}{=} \mathsf{case}\,\stackrel{\leftarrow}{\mathcal{D}}\_{k}(t\_{1}) \,\mathsf{of}\,\langle x\_{1},x'\_{1}\rangle \rightarrow \ldots \rightarrow \mathsf{case}\,\stackrel{\leftarrow}{\mathcal{D}}\_{k}(t\_{n}) \,\mathsf{of}\,\langle x\_{n},x'\_{n}\rangle \rightarrow \\ \qquad\qquad\langle \mathsf{op}(x\_{1},\ldots,x\_{n}),\lambda z.\stackrel{\mbox{\tiny\rightarrow}}{\sum}\_{i=1}^{n}x'\_{i}(\partial\_{i}\mathsf{op}(x\_{1},\ldots,x\_{n})\*z)\rangle. \end{array}$$

The idea is that ←−Dk(t) is a higher order function that simultaneously computes t (the forward pass) and defines as a continuation the reverse pass which computes the gradient. In order to actually run the algorithm, we need two auxiliary definitions

$$\begin{split} \mathsf{lamR}\_{\mathsf{real}}^{k} & \stackrel{\mathsf{def}}{=} \lambda z. \mathsf{case} \, z \, \mathsf{of} \, \langle x, x' \rangle \to \mathsf{case} \, x' \, \mathsf{of} \, \langle x'\_{1}, \dots, x'\_{k} \rangle \to \\ & \langle x, \lambda y. \langle x'\_{1} \, \*y, \dots, x'\_{k} \, \*y \rangle \rangle : \vec{\mathcal{D}}\_{k}(\mathsf{real}) \to \mathsf{\widetilde{\mathcal{D}}\_{k}(\mathsf{real}) \, \mathsf{g} \\ \mathsf{lor} \, \mathsf{p}k & \stackrel{\mathsf{def}}{=} \lambda z. \mathsf{open} \, z \, \mathsf{of} \, \langle x'\_{1}, x' \rangle \quad \langle x, y' \, 1 \rangle \, \xleftarrow{\leftarrow} \mathsf{g} \, \langle \mathsf{real} \, \mathsf{l} \, \mathsf{p} \, \mathsf{lo} \, \mathsf{l} \, \mathsf{lo} \, \mathsf{l} \, \mathsf{lo} \, \mathsf{l} \, \mathsf{lo} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{lo} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{l} \, \mathsf{$$

evR<sup>k</sup> **real** = λz. **case** z **of** x, x →x, x <sup>1</sup> : ←−Dk(**real**) <sup>→</sup> −→Dk(**real**). Here, −→D<sup>k</sup> is a macro on types (and terms) with exactly the same inductive definition as −→D except for the base case −→Dk(**real**) = **(real***∗***real**k**)**. By noting that both −→D<sup>k</sup> and ←−D<sup>k</sup> preserve all type formers, we can extend these definitions to all first order types <sup>τ</sup> : <sup>z</sup> : −→Dk(<sup>τ</sup> ) lamR<sup>k</sup> <sup>τ</sup> (z) : ←−Dk(<sup>τ</sup> ), z : ←−Dk(<sup>τ</sup> ) evR<sup>k</sup> <sup>τ</sup> (z) : −→Dk(τ ). We can think of lamR<sup>k</sup> <sup>τ</sup> (z) as encoding k tangent vectors z : −→Dk(τ ) as a closure, so it is suitable for running ←−Dk(t) on, and evR<sup>k</sup> <sup>τ</sup> (z) as actually evaluating the reverse pass defined by z : ←−Dk(τ ) and returning the result as k tangent vectors. The idea is that given some x : τ t : σ between first order types τ,σ, we run our continuation-based AD by running evR<sup>k</sup> σ( ←−Dk(t)[lamR<sup>k</sup> <sup>τ</sup> (z) /x]).

The correctness proof closely follows that for forward AD. In particular, one defines a binary logical relation **real**r,k = (R, <sup>R</sup> <sup>×</sup> (R<sup>k</sup>)<sup>R</sup>, Sr,k **real**), where Sr,k **real** = (f, x <sup>→</sup> (f(x), y <sup>→</sup> (∂1f(x) <sup>∗</sup> y,..., ∂kf(x) <sup>∗</sup> <sup>y</sup>))) <sup>|</sup> <sup>f</sup> ∈ P<sup>R</sup><sup>k</sup> R , on the plots <sup>P</sup><sup>R</sup><sup>k</sup> <sup>R</sup> × P<sup>R</sup><sup>k</sup> <sup>R</sup>×((Rk)R) and verifies that <sup>c</sup> <sup>×</sup> - ←−Dk(c), <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>×</sup> - ←−Dk(<sup>x</sup> <sup>+</sup> <sup>y</sup>), <sup>x</sup>∗y×- ←−Dk(x∗y) and <sup>ς</sup>(x)×- ←−Dk(ς(x)) respect this logical relation. It follows that this relation extends to a functor −r,k : **Syn** <sup>→</sup> **Gl**<sup>R</sup><sup>k</sup> such that id <sup>×</sup> ←−D<sup>k</sup> factors over −r,k, implying the correctness of the continuation-based AD by the following lemma.

**Lemma 4.** For all first order types τ (i.e. types not involving function types), we have that evR<sup>k</sup> <sup>τ</sup> (lamR<sup>k</sup> <sup>τ</sup> (t)) <sup>=</sup> t.

Proof (notes). This follows by an induction on the structure of τ . The idea is that lamR<sup>k</sup> <sup>τ</sup> embeds reals into function spaces as linear maps, which is undone by evR<sup>k</sup> <sup>τ</sup> by evaluating the linear maps at 1.

To phrase correctness, in this setting, however, we need a few definitions. Keeping in mind the canonical projection <sup>T</sup> (M) <sup>→</sup> <sup>M</sup>, we define <sup>T</sup> <sup>k</sup>(M) as the k-fold categorical pullback (fibre product) T (M) ×<sup>M</sup> ... ×<sup>M</sup> T (M). To be explicit, <sup>T</sup> <sup>k</sup> <sup>x</sup> M consists of k-tuples of tangent vectors at the base point x. Again, <sup>T</sup> <sup>k</sup> extends to a functor **Man** <sup>→</sup> **Man** by defining <sup>T</sup> <sup>k</sup>(f)(x,(v1,...,vk)) def = (f(x),(Tx(f)(v1),..., <sup>T</sup>x(f)(vk))). As <sup>T</sup> <sup>k</sup> preserves countable coproducts and finite products (like T ), it follows that the isomorphisms φ −→D <sup>T</sup> <sup>τ</sup> generalize to canonical isomorphisms φ −→D T τ,k : - −→Dk(<sup>τ</sup> ) → T <sup>k</sup>(<sup>τ</sup> ) for first order types <sup>τ</sup> . This leads to the following correctness statement for continuation-based AD.

**Theorem 3 (Semantic correctness of** ←−Dk**).** For any ground τ , any first order context Γ and any term Γ <sup>t</sup> : <sup>τ</sup> , syntactic translation <sup>t</sup> <sup>→</sup> evR<sup>k</sup> τ ( ←−Dk(t)[lamR<sup>k</sup> <sup>Γ</sup> (z) /...]) coincides with the tangent bundle functor, modulo these canonical isomorphisms:

$$\begin{array}{c} \left[\overrightarrow{\mathcal{D}}\_{k}(\varGamma)\right] \xrightarrow{\left[\text{lamR}^{k}\_{T};\overleftarrow{\mathcal{D}}\_{k}(t);\text{ev}\mathrm{R}^{k}\_{\tau}\right]} \left[\overrightarrow{\mathcal{D}}\_{k}(\tau)\right] \\ \left[\overrightarrow{\mathcal{D}}\_{\varGamma,k}^{\top}\right] \cong \\ \left[\mathcal{T}^{k}(\left[\varGamma\right])\xrightarrow{\left[\text{lcm}\right]} \frac{\left[\begin{array}{c} \exists\right]\_{\varphi\_{\tau,k}} \\ \left[\left[\left[\varGamma\right]\right]\end{array}\right]}{\left[\left[\left[\left[\varGamma\right]\right]\right]\end{array}} \xrightarrow{\left[\text{s}\right]} \begin{array}{c} \exists\left(\left[\varGamma\right]\right) \\ \left[\left[\left[\varGamma\right]\right]\end{array} \end{array}$$

For example, when τ = **real** and Γ = x, y : **real**, we can run our continuationbased AD to compute the gradient of a program x, y : **real** t : **real** at values x = V,y = W by evaluating

$$\operatorname{ev} \mathbf{R}^2\_{\mathbf{real}} \left( \overleftarrow{\mathcal{D}}\_2(t) \big| \begin{matrix} \left( \operatorname{lcm} \mathbf{R}^2\_{x:\mathbf{real}} \, v \right) \langle \!/ \_{x}, \left( \operatorname{lcm} \mathbf{R}^2\_{y:\mathbf{real}} \, w \right) \langle \!/ \_{y} \right] \rangle \Big| \begin{matrix} \left( V, \left( \!/ \_{\mathbf{L}} \mathbf{0} \rangle \right) \langle \!/ \_{v}, \left( \!/ \_{\mathbf{U}} \mathbf{L} \rangle \right) \Big| \,\_{w} \right) \end{matrix} \right)$$

Indeed,

$$\begin{split} & \left( \left[ \text{ev} \text{R}^2\_{\mathbf{real}} \left( \overleftarrow{\mathcal{D}}\_2 (t) \right] \left( ^{\text{lam} \text{R}^2\_{x:\mathbf{real}} \, v} \right) /\_{x}, \left( ^{\text{lam} \text{R}^2\_{y:\mathbf{real}} \, w} \right) /\_{y} \right] \right) \left[ \left( ^{V,\langle \mathbf{l.} \mathbf{0} \rangle} \rangle /\_{v}, \left( ^{W,\langle \mathbf{0.} \mathbf{1} \rangle} \right) /\_{w} \right] \right] &= \\ & \left( \left[ t \right] (\left[ \left[ V \right] \right], \left[ W \right]), \partial\_{1} \left[ t \right] (\left[ V \right], \left[ W \right]), \partial\_{2} \left[ t \right] (\left[ V \right], \left[ W \right]) \right). \end{split}$$

## **7 Discussion and future work**

**Summary.** We have shown that diffeological spaces provide a denotational semantics for a higher order language with variants and inductive types (§3,4). We have used this to show correctness of a simple AD translation (Thm. 1, Thm. 2). But the method is not tied to this specific translation, as we illustrated in Section 6.

The structure of our elementary correctness argument for Theorem 1 is a typical logical relations proof. As explained in Section 5, this can equivalently be understood as a denotational semantics in a new kind of space obtained by categorical gluing.

Overall, then, there are two logical relations at play. One is in diffeological spaces, which ensures that all definable functions are smooth. The other is in the correctness proof (equivalently in the categorical gluing), which explicitly tracks the derivative of each function, and tracks the syntactic AD even at higher types.

**Connection to the state of the art in AD implementation.** As is common in denotational semantics research, we have here focused on an idealized language and simple translations to illustrate the main aspects of the method. There are a number of points where our approach is simplistic compared to the advanced current practice, as we now explain.

Representation of vectors. In our examples we have treated n-vectors as tuples of length n. This style of programming does not scale to large n. A better solution would be to use array types, following [31]. Our categorical semantics and correctness proofs straightforwardly extend to cover them, in a similar way to our treatment of lists.

Efficient forward-mode AD. For AD to be useful, it must be fast. The syntactic translation −→D that we use is the basis of an efficient AD library [31]. However, numerous optimizations are needed, ranging from algebraic manipulations, to partial evaluations, to the use of an optimizing C compiler. A topic for future work would be to validate some of these manipulations using our semantics. The resulting implementation is performant in experiments [31].

Efficient reverse-mode AD. Our sketch of continuation-based AD is primarily intended to emphasise that our denotational approach is not tied to any specific translation −→D . Nonetheless, it is worth noting that this algorithm shares similarities with advanced reverse-mode implementations: (1) it calculates derivatives in a (contravariant) "reverse pass" in which derivatives of operations are evaluated in the reverse order compared to their order in calculating the function value; (2) it can be used to calculate the full gradient of a function <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> in a single reverse pass (while n passes of fwd AD would be necessary). However, it lacks important optimizations and the continuation scales with the size of the input n where it should scale with the size of the output. This adds an important overhead, as pointed out in [26]. Speed being the main attraction of reverse-mode AD, its implementations tend to rely on mutable state, control operators and/or staging [26, 6, 34, 5], which we have not considered here.

Other language features. The idealized languages that we considered so far do not touch on several useful language constructs. For example: the use of functions that are partial (such as division) or partly-smooth (such as RelU); phenomena such as iteration, recursion; and probabilities. There are suggestions that the denotational approach using diffeological spaces can be adapted to these features using standard categorical methods. We leave this for future work.

**Acknowledgements.** We have benefited from discussing this work with many people, including B. Pearlmutter, O. Kammar, C. Mak, L. Ong, G. Plotkin, A. Shaikhha, J. Sigal, and others. Our work is supported by the Royal Society and by a Facebook Research Award. In the course of this work, MV has also been employed at Oxford (EPSRC Project EP/M023974/1) and at Columbia in the Stan development team. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 895827.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Deep Induction: Induction Rules for (Truly) Nested Types**

Patricia JohannB and Andrew Polonsky Appalachian State University, Boone, NC, USA johannp@appstate.edu, polonskya@appstate.edu

**Abstract.** This paper introduces *deep induction*, and shows that it is the notion of induction most appropriate to nested types and other data types defined over, or mutually recursively with, (other) such types. Standard induction rules induct over only the top-level structure of data, leaving any data internal to the top-level structure untouched. By contrast, deep induction rules induct over *all* of the structured data present. We give a grammar generating a robust class of nested types (and thus ADTs), and develop a fundamental theory of deep induction for them using their recently defined semantics as fixed points of accessible functors on locally presentable categories. We then use our theory to derive deep induction rules for some common ADTs and nested types, and show how these rules specialize to give the standard structural induction rules for these types. We also show how deep induction specializes to solve the long-standing problem of deriving principled and practically useful structural induction rules for bushes and other *truly* nested types. Overall, deep induction opens the way to making induction principles appropriate to richly structured data types available in programming languages and proof assistants. Agda implementations of our development and examples, including two extended case studies, are available.

## **1 Introduction**

This paper is concerned with the problem of inductive reasoning about inductive data types that are defined over, or are defined mutually recursively with, (other) such data types. Examples of such deep data types include, trivially, ordinary algebraic data types (ADTs), such as list and tree types; data types, such as the forest type, whose recursive occurrences appear below other type constructors; simple nested types, such as the type of perfect trees, whose recursive occurrences never appear below their own type constructors; truly<sup>1</sup> nested types, such as the type of bushes (also called bootstrapped heaps by Okasaki [16]), whose recursive occurrences do appear below their own type constructors; and GADTs. Proof assistants, including Coq and Agda, currently provide insufficient support for performing induction over deep data types. The induction rules, if any, they generate for such types induct over only their top-level structures, leaving any data internal to the top-level structure untouched. This paper develops a principle that, by contrast, inducts over all of the structured data present. We call this principle deep induction. Deep induction not only provides general support for solving problems that previously had only (usually quite painful and) ad hoc solutions, but also opens the way for incorporating automatic generation of useful induction rules for deep data types into proof assistants.

<sup>1</sup> Nested types that are defined over themselves are known as *truly nested types*.

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 339–358, 2020. https://doi.org/10.1007/978-3-030-45231-5\_18

To illustrate the difference between structural induction and deep induction, note that the data inside a structure of type List a = Nil | Cons a (List a) is treated monolithically (i.e., ignored) by the structural induction rule for lists:

$$\begin{array}{l} \forall (\mathtt{a} : \mathtt{Set}) \left( \mathtt{P} : \mathtt{List} \,\mathtt{a} \to \mathtt{Set} \right) \to \mathtt{P} \,\mathtt{Mi1} \to \\\ \forall \left( \mathtt{x} : \mathtt{a} \right) \left( \mathtt{xs} : \mathtt{List} \,\mathtt{a} \right) \to \mathtt{P} \,\mathtt{xs} \to \mathtt{P} \,\left( \mathtt{Cons} \,\mathtt{x} \,\mathtt{xs} \right) \right) \to \forall \left( \mathtt{xs} : \mathtt{List} \,\mathtt{a} \right) \to \mathtt{P} \,\mathtt{xs} \,\mathtt{a} \end{array}$$

By contrast, the deep induction rule for lists traverses not just the outer list structure with a predicate P, but also each data element of that list with a custom predicate Q:

$$\begin{array}{l} \forall (\mathsf{a} : \mathsf{Set}) \left( \mathsf{P} : \mathsf{List} \, \mathsf{a} \to \mathsf{Set} \right) \left( \mathsf{Q} : \mathsf{a} \to \mathsf{Set} \right) \to \\ \mathsf{P} \, \mathsf{Ni1} \to \left( \forall (\mathsf{x} : \mathsf{a}) \, (\mathsf{x} \mathbf{s} : \mathsf{List} \, \mathbf{a}) \to \mathsf{Q} \, \mathbf{x} \to \mathsf{P} \, \mathbf{x} \mathbf{s} \to \mathsf{P} \, (\mathsf{Cons} \, \mathbf{x} \, \mathbf{x} \, \mathbf{s}) \right) \to \\ \forall (\mathsf{x} \mathbf{s} : \mathsf{List} \, \mathbf{a}) \to \mathsf{List} \, \mathsf{Q} \, \mathbf{x} \mathbf{s} \to \mathsf{P} \, \mathbf{x} \mathbf{s} \end{array}$$

Here, List<sup>∧</sup> lifts its argument predicate Q on data of type a to a predicate on data of type List a asserting that Q holds for every element of its argument list. The structural induction rule for lists is, like that for any ADT, recovered by taking the custom predicate in the corresponding deep rule to be λx. True.

A particular advantage of deep induction is that it obviates the need to reflect properties as data types. For example, although the set of primes cannot be defined by an ADT, the primeness predicate Prime on the ADT of natural numbers can be lifted to a predicate List<sup>∧</sup> Prime characterizing lists of primes. Properties can then be proved for lists of primes using the following deep induction rule:

$$\begin{array}{l} \forall (\mathtt{P} : \mathtt{List} \,\mathtt{Not} \to \mathtt{Set}) \to \mathtt{P} \,\mathtt{Ni} \,\mathtt{1} \to \\\ (\forall (\mathtt{x} : \mathtt{Nat}) \,(\mathtt{xs} : \mathtt{List} \,\mathtt{Nat}) \to \mathtt{Prime} \,\mathtt{x} \to \mathtt{P} \,\mathtt{xs} \to \mathtt{P} \,(\mathtt{Cons} \,\mathtt{x} \,\mathtt{xs})) \to \\\ \forall (\mathtt{xs} : \mathtt{List} \,\mathtt{Nat}) \to \mathtt{List} \,\mathtt{?} \,\mathtt{Prime} \,\mathtt{xs} \to \mathtt{P} \,\mathtt{xs} \end{array}$$

As we'll see in Sections 3, 4, and 5, the extra flexibility afforded by lifting predicates like Q and Prime on data internal to a structure makes it possible to derive useful induction principles for more complex types, such as truly nested ones.

In each of the above examples, a predicate on the data is lifted to a predicate on the list. This is an example of lifting a predicate on a type in a non-recursive position of an ADT's definition to the entire ADT. However, the predicate to be lifted can also be on the type in a recursive position of a definition — i.e., on the ADT being defined itself — and this ADT can appear below another type constructor in the definition. This is exactly the situation for the ADT Forest a, which appears below the type constructor List in the definition

$$\texttt{Foresst a = } \texttt{FEmpty | FNode a (List (Foresst a))}$$

The induction rule Coq generates for forests is

$$\begin{array}{l} \left(\forall \left(\mathbf{a}:\mathbf{Set}\right)\left(\mathbb{P}:\mathbf{Forest\,a}\rightarrow\mathbf{Set}\right)\rightarrow\mathbb{P}\,\mathbf{FEmpty}\rightarrow\\ \left(\forall \left(\mathbf{x}:\mathbf{a}\right)\left(\mathbf{ts}:\mathbf{List\,(Forest\,a)}\right)\rightarrow\mathbb{P}\left(\mathbf{FMod\,x\,ts})\right)\rightarrow\forall\left(\mathbf{x}:\mathbf{Forest\,a}\right)\rightarrow\mathbb{P}\mathbf{x}\rightarrow\end{array} \right)$$

However, this is neither the induction rule we intuitively expect, nor is it expressive enough to prove even basic properties of forests that ought to be amenable to inductive proof. The approach of [11,12] does give the expected rule<sup>2</sup>

<sup>2</sup> This is equivalent to the rule as classically stated in Coq/Isabelle/HOL.

$$\begin{array}{l} \forall (\mathtt{a} : \mathtt{Set}) \left( \mathtt{P} : \mathtt{Foresst} \,\mathtt{a} \to \mathtt{Set} \right) \to \mathtt{P} \,\mathtt{Fampty} \to \\\ (\forall (\mathtt{x} : \mathtt{a}) \left( \mathtt{ts} : \mathtt{List} \left( \mathtt{Foresst} \,\mathtt{a} \right) \right) \to \left( \forall \left( \mathtt{k} < \mathtt{length} \,\mathtt{ts} \right) \to \mathtt{P} \left( \mathtt{ts} ! \mathtt{!k} \right) \right) \\\ \mathtt{\rightarrow} \mathsf{P} \left( \mathtt{FMode} \,\mathtt{x} \,\mathtt{ts} \right) \rangle \to \forall \left( \mathtt{x} : \mathtt{Foresst} \,\mathtt{a} \right) \to \mathtt{P} \,\mathtt{x} \end{array}$$

But to derive it, a technique based on list positions is used to propagate the predicate to be proved over the list of forests that is the second argument to the data constructor FNode. Unfortunately, this technique does not obviously extend to other deep data types, including the type of "generalized forests" introduced in Section 4.4 below, which combines smaller generalized forests into larger ones using a type constructor f potentially different from List. Nevertheless, replacing <sup>∀</sup> (<sup>k</sup> <sup>&</sup>lt; length ts) <sup>→</sup> <sup>P</sup> (ts!!k) in the expected rule with List<sup>∧</sup> P ts, which is equivalent, reveals that it is nothing more than the special case for <sup>Q</sup> <sup>=</sup> <sup>λ</sup>x. True of the following deep induction rule for Forest a:

$$\begin{array}{l} \forall (\mathsf{a} : \mathsf{Set}) \left( \mathsf{P} : \mathsf{Foresst } \mathsf{a} \to \mathsf{Set} \right) \left( \mathsf{Q} : \mathsf{a} \to \mathsf{Set} \right) \to \mathsf{P} \,\mathsf{FEmpty} \to \\\ \forall \left( \mathsf{x} : \mathsf{a} \right) \left( \mathsf{ts} : \mathsf{List} \left( \mathsf{Foresst } \mathsf{a} \right) \right) \to \mathsf{Q} \,\mathsf{x} \to \mathsf{List} \,\mathsf{d}^{\wedge} \,\mathsf{P} \,\mathsf{ts} \to \mathsf{P} \,\left( \mathsf{Fonse} \,\mathsf{x} \,\mathsf{ts} \right) \right) \to \\\ \forall \left( \mathsf{x} : \mathsf{Foresst } \mathsf{a} \right) \to \mathsf{Foresst}^{\wedge} \,\mathsf{Q} \,\mathsf{x} \to \mathsf{P} \,\mathsf{x} \end{array}$$

When types, like Forest a and List (Forest a) above, are defined by mutual recursion, their (deep) induction rules are defined by mutual recursion as well. For example, the induction rules for the ADTs

```
data Expr = Lit Nat | Add Expr Expr | If BExpr Expr Expr
data BExpr = BLit Bool | And BExpr BExpr | Not BExpr | Equal Expr Expr
```
of integer and boolean expressions are defined by mutual recursion as

```
∀(P : Expr → Set)(Q : BExpr → Set) →
  (∀(n : Nat) → P (Lit n)) →
  (∀(e1 : Expr)(e2 : Expr) → P e1 → P e2 → P (Add e1 e2)) →
  (∀(b : BExpr)(e1 : Expr)(e2 : Expr) → Q b → P e1 → P e2 → P (If b e1 e2)) →
  (∀(b : Bool). Q (BLit b)) →
  (∀(b1 : BExpr)(b2 : BExpr) → Q b1 → Q b2 → Q (And b1 b2)) →
  (∀(b : BExpr) → Q b → Q (Not b)) →
  (∀(e1 : Expr)(e2 : Expr) → P e1 → P e2 → Q (Equal e1 e2)) →
  (∀(e : Expr) → P e) × (∀(b : BExpr) → Q b)
```
## **2 The Key Idea**

As the examples of the previous section suggest, the key to deriving deep induction rules from (deep) data type declarations is to parameterize the induction rules not just over a predicate over the top-level data type being defined, but over predicates on the types of primitive data they contain as well. These additional predicates are then lifted to predicates on any internal structures containing these data, and the resulting predicates on these internal structures are lifted to predicates on any internal structures containing structures at the previous level, and so on, until the internal structures at all levels of the data type definition, including the top level, have been so processed. Satisfaction of a predicate by the data at one level of a structure is then conditioned upon satisfaction of the

appropriate predicates by all of the data at the preceding level.

The above deep induction rules were all obtained using this technique. For example, the deep induction rule for lists is derived by first noting that structures of type List a contain only data of type a, so that only one additional predicate parameter, which we called Q above, is needed. Then, since the only data structure internal to the type List a is List a itself, Q need only be lifted to lists containing data of type a. This is exactly what List<sup>∧</sup> Q does. Finally, the deep induction rule for lists is obtained by parameterizing the standard one over not just P but also Q, adding the additional hypothesis Q x to its second antecedent, and adding the additional hypothesis List<sup>∧</sup> Q xs to its conclusion.

The deep induction rule for forests is similarly obtained from the Coqgenerated rule by first parameterizing over an additional predicate Q on the type a of data stored in the forest, then lifting P to a predicate on lists containing data of type Forest a and Q to forests containing data of type a, and, finally, adding the additional hypotheses Q x and List<sup>∧</sup> P ts to its second antecedent and the additional hypothesis Forest<sup>∧</sup> Q x to its conclusion.

Predicate liftings such as List<sup>∧</sup> and Forest<sup>∧</sup> may either be supplied as primitives, or be generated automatically from the definitions of the types themselves, as described in Section 4. For container types, lifting a predicate amounts to traversing the container and applying the argument predicate pointwise.

Our technique for deriving deep induction rules for ADTs, as well as its generalization to nested types given in Section 3, is both made precise and rigorously justified in Section <sup>4</sup> using the results of [13]. This paper can thus be seen as a concrete application, in the specific category Fam, of the very general semantics developed in [13]; indeed, our induction rules are computed as the interpretations of the syntax for nested types in Fam. A general methodology is extracted in Section 5. The rest of this paper can be read either as "just" describing how to generate deep induction rules in practice, or as also proving that our technique for doing so is provably correct and general. Our Agda code is at [14].

## **3 Extending to Nested Types**

Appropriately generalizing the basic technique of Section <sup>2</sup> derives deep induction rules, and therefore structural induction rules, for nested types, including truly nested types and other deep nested types. Nested types generalize ADTs by allowing elements at one instance of a data type to depend on data at other instances of the same type so that, in effect, the entire family of instances is constructed simultaneously. That is, rather than defining standalone families of inductive types, one for each choice of types to which type constructors like List and Tree are applied, the type constructors for nested types define inductive families of types. The structural induction rule for a nested type must therefore account for its changing type parameters by parameterizing over an appropriately polymorphic predicate, and appropriately instantiating that predicate's type argument at each application site. For example, the structural induction rule for the nested type

$$\text{PTree} \triangleq \text{PLeaf a} \mid \text{PNode} \left( \text{PTree} \left( \mathbf{a} \times \mathbf{a} \right) \right)$$

of perfect trees is

$$\begin{array}{l} \forall (\mathtt{P} : \forall \,(\mathtt{a} : \mathtt{Set}) \to \mathtt{PTree } \mathtt{a} \to \mathtt{Set}) \to \\ \quad (\forall \,(\mathtt{a} : \mathtt{Set}) \,(\mathtt{x} : \mathtt{a}) \to \mathtt{P} \,\mathtt{a} \,(\mathtt{P} \mathtt{Loaf } \mathtt{x})) \to \\ \quad (\forall \,(\mathtt{a} : \mathtt{Set}) \,(\mathtt{x} : \mathtt{PTree } (\mathtt{a} \times \mathtt{a})) \to \mathtt{P} \,(\mathtt{a} \times \mathtt{a}) \,\mathtt{x} \to \mathtt{P} \,\mathtt{a} \,(\mathtt{P} \mathtt{Node } \mathtt{x})) \to \\ \quad \forall \,(\mathtt{a} : \mathtt{Set}) \,(\mathtt{x} : \mathtt{PTree } \mathtt{a}) \to \mathtt{P} \,\mathtt{a} \,\mathtt{x} \end{array}$$

and the structural induction rule for the nested type

data Lam a where Var :: a → Lam a App :: Lam a → Lam a → Lam a Abs :: Lam (Maybe a) → Lam a

of de Bruijn encoded lambda terms [9] with variables of type a is

$$\begin{array}{l} \forall (\mathtt{P} : \forall (\mathtt{a} : \mathtt{Set}) \to \mathtt{Lan} \,\mathtt{a} \to \mathtt{Set}) \to \\ \quad (\forall (\mathtt{a} : \mathtt{Set}) \,(\mathtt{x} : \mathtt{a}) \to \mathtt{P} \,\mathtt{a} \,(\mathtt{Var} \,\mathtt{x})) \to \\ \quad (\forall (\mathtt{a} : \mathtt{Set}) \,(\mathtt{x} : \mathtt{Lan} \,\mathtt{a}) \,(\mathtt{y} : \mathtt{Lan} \,\mathtt{a}) \to \mathtt{P} \,\mathtt{a} \,\mathtt{y} \to \mathtt{P} \,\mathtt{a} \,(\mathtt{App} \,\mathtt{x} \,\mathtt{y})) \to \\ \quad (\forall (\mathtt{a} : \mathtt{Set}) \,(\mathtt{x} : \mathtt{Lan} (\mathtt{Maybe} \,\mathtt{a})) \to \mathtt{P} \,(\mathtt{Aaybe} \,\mathtt{a}) \,\mathtt{x} \to \mathtt{P} \,\mathtt{a} \,(\mathtt{Abs} \,\mathtt{x}))) \to \\ \quad \forall (\mathtt{a} : \mathtt{Set}) \,(\mathtt{x} : \mathtt{Lan} \,\mathtt{a}) \to \mathtt{P} \,\mathtt{a} \,\mathtt{x} \end{array}$$

Deep induction rules for nested types must similarly account for their type constructors' changing type parameters while also parameterizing over the additional predicate on the type of data they contain. Letting Pair<sup>∧</sup> <sup>Q</sup> be the lifting of a predicate Q on a to pairs of type a × a, so that Pair<sup>∧</sup> Q (x, y) = Q x × Q y, this gives the deep induction rule

$$\begin{array}{l} \vee \left(\mathsf{P} : \forall \left(\mathsf{a} : \mathsf{Set}\right) \rightarrow \left(\mathsf{a} \rightarrow \mathsf{Set}\right) \rightarrow \mathsf{P} \mathsf{Tree}\mathsf{a} \rightarrow \mathsf{Set}\right) \rightarrow \\ \left(\forall \left(\mathsf{a} : \mathsf{Set}\right) \left(\mathsf{Q} : \mathsf{a} \rightarrow \mathsf{Set}\right) \left(\mathsf{x} : \mathsf{a}\right) \rightarrow \mathsf{Q} \mathsf{x} \rightarrow \mathsf{P} \mathsf{a} \,\mathsf{Q} \left(\mathsf{P} \mathsf{Leaf} \,\mathsf{x}\right)\right) \rightarrow \\ \left(\forall \left(\mathsf{a} : \mathsf{Set}\right) \left(\mathsf{Q} : \mathsf{a} \rightarrow \mathsf{Set}\right) \left(\mathsf{x} : \mathsf{P} \mathsf{Tree} \,\mathsf{e} \left(\mathsf{a} \times \mathsf{a}\right)\right) \rightarrow \mathsf{P} \left(\mathsf{a} \times \mathsf{a}\right) \left(\mathsf{P} \mathsf{air} \,\mathsf{r}^{\wedge} \,\mathsf{Q}\right) \mathbf{x} \rightarrow \\ \qquad \qquad \qquad \mathsf{P} \mathsf{a} \,\mathsf{Q} \left(\mathsf{P} \mathsf{Node} \,\mathsf{x}\right)\right) \rightarrow \\ \forall \left(\mathsf{a} : \mathsf{Set}\right) \left(\mathsf{Q} : \mathsf{a} \rightarrow \mathsf{Set}\right) \left(\mathsf{x} : \mathsf{P} \mathsf{Tree} \,\mathsf{e} \,\mathsf{a}\right) \rightarrow \mathsf{P} \mathsf{Tree} \mathsf{e}^{\wedge} \,\mathsf{Q} \,\mathsf{x} \rightarrow \mathsf{P} \mathsf{a} \,\mathsf{Q} \,\mathsf{x} \end{array}$$

for perfect trees, and the deep induction rule

$$\begin{array}{c} \forall (\mathtt{P} : \forall (\mathtt{a} : \mathtt{Set}) \to (\mathtt{a} \to \mathtt{Set}) \to \mathtt{Lan} \,\mathtt{a} \to \mathtt{Set}) \to \\ \quad (\forall (\mathtt{a} : \mathtt{Set}) \,(\mathtt{Q} : \mathtt{a} \to \mathtt{Set}) \,(\mathtt{x} : \mathtt{a}) \to \mathtt{Q} \,\mathtt{x} \to \mathtt{P} \,\mathtt{a} \,\mathtt{Q} \,(\mathtt{Var} \,\mathtt{x})) \to \\ \quad (\forall (\mathtt{a} : \mathtt{Set}) \,(\mathtt{Q} : \mathtt{a} \to \mathtt{Set}) \,(\mathtt{x} : \mathtt{Lan} \,\mathtt{a}) \,(\mathtt{y} : \mathtt{Lan} \,\mathtt{a}) \to \mathtt{P} \,\mathtt{a} \,\mathtt{Q} \,\mathtt{x} \to \mathtt{P} \,\mathtt{a} \,\mathtt{Q} \,\mathtt{y} \to \\ \qquad \qquad \qquad \qquad \mathtt{P} \,\mathtt{a} \,\mathtt{Q} \,(\mathtt{A} \mathtt{p} \,\mathtt{x} \,\mathtt{y})) \to \\ \quad (\forall (\mathtt{a} : \mathtt{Set}) \,(\mathtt{Q} : \mathtt{a} \to \mathtt{Set}) \,(\mathtt{x} : \mathtt{Lan} \,(\mathtt{Maybe} \,\mathtt{a})) \to \mathtt{P} \,(\mathtt{Maybe} \,\mathtt{a}) \,(\mathtt{Maybe} \,\mathtt{^{\lor}} \,\mathtt{Q}) \,\mathtt{x} \to \mathtt{P} \,(\mathtt{x} : \mathtt{Sat} \,(\mathtt{x} : \mathtt{Mata} \,\mathtt{x})) \to \\ \qquad \qquad \qquad \qquad \qquad \qquad \mathtt{P} \,\mathtt{a} \,\mathtt{Q} \,(\mathtt{A} \mathtt{b} \,\mathtt{x})) \to \\ \qquad$$

for lambda terms. As usual, the structural induction rules for these types can be recovered by setting Q = λx. True in their deep induction rules. Moreover, the basic technique described in Section 2 can be recovered from the more general one described in this section by noting that the type arguments to ADT data type constructors don't change, and that the internal predicate parameter to P can therefore be lifted to the outermost level of ADT induction rules.

We conclude this section by giving both structural and deep induction rules

for the following truly nested type of bushes [8]:

Bush a = BNil | BCons a (Bush (Bush a))

(Note that this type is not even definable in Agda.) Correct and useful structural induction rules for bushes and other truly nested types have long been elusive. One recent effort to derive such rules has been recorded in [10], but the approach taken there is more ad hoc than not, and generates induction rules for data types related to the nested types of interest rather than for the original nested types themselves. To treat bushes, for example, Fu and Selinger rewrite the type Bush a as NBush (Succ Zero) a, where NBush = NTimes Bush and

> NTimes :: (Set → Set) → Nat → Set → Set NTimes p Zero s = s NTimes p (Succ n) s = p (NTimespns)

Their induction rule for bushes is then given in terms of these rewritten ones as

∀ (a : Set)(P : ∀ (n : Nat) → NBush n a → Set) → (∀ (x : a) → P Zero x) → (∀ (n : Nat) → P (Succ n) BNil) → (∀ (n : Nat)(x : NBush n a)(xs : NBush (Succ (Succ n)) a) → Pnx → P (Succ (Succ n)) xs → P (Succ n)(BCons x xs)) → ∀ (n : Nat)(xs : NBush n a) → P n xs

This approach appears promising, but is not yet fully mature. The core difficulty is that, although Fu and Selinger "hint at how the construction ... can be generalized to arbitrary nested types" and "give an example of nested data type [sic] that is hopefully general enough to make it clear what one would do in the general case" in Section 5 of [10], they do not show how to derive their induction rules in a uniform and principled way even for the "reasonably arbitrary and general" nested types they consider. As a result, it is unclear what guarantees that the induction rules they derive are correct, either for the original nested types or for their rewritten versions, or whether the induction rules for the rewritten nested types are sufficiently expressive to prove all results about the original nested types that one would expect to be provable by induction. This latter point echoes the issue with Coq-derived induction rules for forests mentioned above, and has the unfortunate effect of forcing users to manually write induction (and other) rules for such types for use in that system [17].

Direct application of the general technique illustrated above and explicated in full in Section 4 below derives the following first-ever useful induction rule for bushes, respectively — a full 20 years after bushes were first introduced!

$$\begin{array}{l} \forall (\mathsf{P} : \forall (\mathsf{a} : \mathsf{Set}) \to \mathsf{Bush} \,\,\mathsf{a} \to \mathsf{Set}) \to \\ \quad (\forall (\mathsf{a} : \mathsf{Set}) \to \mathsf{P} \,\mathsf{a} \,\mathsf{BNi1}) \to \\ \quad (\forall (\mathsf{a} : \mathsf{Set}) \,\,(\mathsf{x} : \mathsf{a}) \,(\mathsf{y} : \mathsf{Bush} \,(\mathsf{Bush} \,\mathsf{a}))) \to \mathsf{P} \,(\mathsf{Bush} \,\mathsf{a}) \,\,\mathsf{y} \to \mathsf{P} \,\,\mathsf{a} \,(\mathsf{BCons} \,\,\mathsf{x} \,\,\mathsf{y})) \to \\ \forall (\mathsf{a} : \mathsf{Set}) \,(\mathsf{x} : \mathsf{Bush} \,\,\mathsf{a}) \to \mathsf{P} \,\,\mathsf{x} \,\,\mathsf{x} \end{array}$$

In the next section we will see that this rule is derivable from the following more general one:

$$\begin{array}{c} \forall (\mathsf{P} : \forall (\mathsf{a} : \mathsf{Set}) \to (\mathsf{a} \to \mathsf{Set}) \to \mathsf{Bush}\,\mathsf{a} \to \mathsf{Set}) \to \\ \quad (\forall (\mathsf{a} : \mathsf{Set}) \,(\mathsf{Q} : \mathsf{a} \to \mathsf{Set}) \to \mathsf{P}\,\mathsf{a} \,\mathsf{Q}\,\mathsf{b}\,\mathsf{n} \,\mathsf{1}) \to \\ \quad (\forall (\mathsf{a} : \mathsf{Set}) \,(\mathsf{Q} : \mathsf{a} \to \mathsf{Set}) \,(\mathsf{x} : \mathsf{a}) \,(\mathsf{y} : \mathsf{Bush}\,(\mathsf{Bush}\,\mathsf{a})) \to \\ \qquad \qquad \qquad \qquad \mathsf{Q} \,\mathsf{x} \to \mathsf{P}\,(\mathsf{Bush}\,\mathsf{a}) \,(\mathsf{P} \,\mathsf{a} \,\mathsf{Q}) \,\mathsf{y} \to \mathsf{P}\,\mathsf{a} \,\mathsf{Q}\,(\mathsf{Bcons}\,\mathsf{x}\,\mathsf{y})) \to \\ \quad \forall (\mathsf{a} : \mathsf{Set}) \,(\mathsf{Q} : \mathsf{a} \to \mathsf{Set}) \,(\mathsf{x} : \mathsf{Bush}\,\mathsf{a}) \to \mathsf{Bush}\,\mathsf{n} \,\mathsf{q} \,\mathsf{x} \to \mathsf{P}\,\mathsf{a} \,\mathsf{Q}\,\mathsf{x} \,\mathsf{x} \end{array}$$

## **4 Theoretical Foundations**

This section gives a grammar generating a robust class of nested types, including ADTs and truly nested types, and recaps the semantics given in [13] for them from which we derive their deep induction rules. This entire paper can thus be read as a practical application of the abstract results of [13].

## **4.1 Categorical Preliminaries**

We write a : A if A is category and a is an object of A. We write 0<sup>A</sup> and 1<sup>A</sup> for the initial and terminal objects of A, and o<sup>A</sup> and !<sup>A</sup> for the unique maps o<sup>A</sup> : 0<sup>A</sup> → A and !<sup>A</sup> : A → 1A, respectively. If A is the category Set of sets and functions between them, we write 0 for 0Set, i.e., for <sup>∅</sup>, and 1 for any 1-element set, i.e., for 1Set. If a : A we write K<sup>a</sup> for the constantly a-valued functor on A. The category Fam, which we will use to interpret predicates, is given by:

**Definition 1.** The category Fam comprises the following:


## **4.2 Syntax and Semantics of ADTs**

If V is a countable set of type variables, V ⊆ V is finite, α ∈ V, and we write V,α for <sup>V</sup> ∪ {α}, then the following grammar generates (representations of) all standard polynomial ADTs over V , i.e., all ADTs defined over data of primitive types:

<sup>A</sup><sup>V</sup> := 0 <sup>|</sup> <sup>1</sup> <sup>|</sup> <sup>α</sup> <sup>∈</sup> <sup>V</sup> | A<sup>V</sup> <sup>+</sup> <sup>A</sup><sup>V</sup> | A<sup>V</sup> × A<sup>V</sup> <sup>|</sup> μα.AV,α

The grammar A = - <sup>V</sup> <sup>A</sup><sup>V</sup> also generates (representations of) deep ADTs, i.e., ADTs defined not just over data of the primitive types, but over data of other ADTs as well. For example, it generates the representation List <sup>α</sup> := μβ. 1+α×<sup>β</sup> of the type List a, the representation Forest α := μβ. 1+α×μγ. 1+β×γ of the type Forest a, and the representation μδ. 1+(μβ. 1+α×μγ. 1+β ×γ)×δ of the type List (Forest a). Using Bekiˇc's Lemma, it can also generate (representations of) ADTs defined by mutual recursion such as Expr := μα. s(α, μβ. t(α, β)) and BExpr := μβ. t(Expr , β), where s(α, β) := Nat + α × α + β × α × α and t(α, β) := Bool + β × β + β + α × α for the ADTs of integer and boolean expressions from Section 1. ADTs with more than one type argument can be handled by tupling them into one or, equivalently, by noting that such ADTs are generated by the extension N of the grammar A given in Section 4.4. We adopt the usual conventions regarding free and bound type variables for A.

As usual, ADTs are interpreted relative to environments.

**Definition 2.** A set environment σ is a function from a finite subset V of V to Set. We write EnvSet <sup>V</sup> for the set of set environments whose domain is V . If <sup>A</sup> <sup>∈</sup> Set, <sup>σ</sup> <sup>∈</sup> EnvSet <sup>V</sup> , and α -∈ V , then σ[α := A] is the set environment with domain V,α that extends σ by mapping α to A. We write σα in place of σ(α) for the image of α under σ, and [] for the set environment with domain V = ∅.

It is well-known that the ADTs generated by the grammar A have initial algebra semantics in the category Set. That is, each such ADT μα. E can be interpreted as the carrier μF of the initial algebra for the polynomial endofunctor F on Set that interprets its body E. In particular, the final clause of the next definition is well-defined.

**Definition 3.** The interpretation function -·Set : <sup>A</sup><sup>V</sup> <sup>→</sup> EnvSet <sup>V</sup> → Set is:

$$\begin{array}{c} \left[0\right]^{\mathsf{Set}}\sigma = 0\\ \left[1\right]^{\mathsf{Set}}\sigma = 1\\ \left[\alpha\right]^{\mathsf{Set}}\sigma = \alpha\sigma\\ \left[E\_1 + E\_2\right]^{\mathsf{Set}}\sigma = \left[E\_1\right]^{\mathsf{Set}}\sigma + \left[E\_2\right]^{\mathsf{Set}}\sigma\\ \left[E\_1 \times E\_2\right]^{\mathsf{Set}}\sigma = \left[E\_1\right]^{\mathsf{Set}}\sigma \times \left[E\_2\right]^{\mathsf{Set}}\sigma\\ \left[\mu\alpha.\ E\right]^{\mathsf{Set}}\sigma = \mu(A \mapsto \left[E\right]^{\mathsf{Set}}\sigma[\alpha := A]) \end{array}$$

Like Set, the category Fam has sufficient structure to interpret ADTs generated by the grammar <sup>A</sup>. In particular, it interprets bodies of polynomial ADTs.

#### **Definition 4.** The category Fam supports the following constructions:


and (α- , β- ):(A- , P- ) → (B,Q) is (α+α- , δ), where δ : Πx∈A+A- (P +P- )x → Q((α + α- )x) is defined by δ(inL a) = βa and δ(inR a- ) = β- a- . As expected, ((α, β)+ (α- , β- )) ◦ inL = (α, β) and ((α, β)+ (α- , β- )) ◦ inR = (α- , β- ).

**–** *Products*: Given (A, P),(A- , P- ) : Fam, the product (A, P) × (A- , P- ) : Fam is (A × A- , λ(a, a- ) : A × A- .Pa × P- a- ). The associated projections π<sup>1</sup> : (A, P) × (A- , P- ) → (A, P) and π<sup>2</sup> : (A, P) × (A- , P- ) → (A- , P- ) are given by π<sup>1</sup> = (π1, λ(a, a- ) : A × A- . π1) and π<sup>2</sup> = (π2, λ(a, a- ) : A × A- . π2). The product (α, β) × (α- , β- ):(A, P) → (B,Q) × (B- , Q- ) of morphisms (α, β) : (A, P) → (B,Q) and (α- , β- ):(A, P) → (B- , Q- ) is (λa : A.(αa, α- a), λa : A. λx : P a.(βax, β- ax)). As expected, π<sup>1</sup> ◦ ((α, β) × (α- , β- )) = (α, β) and π<sup>2</sup> ◦ ((α, β) × (α- , β- )) = (α- , β- ).

To interpret ADTs generated by <sup>A</sup> in Fam we also need to be able to interpret expressions of the form μα.E. This we do by computing the least fixed point in Fam of the functor <sup>G</sup> : Fam <sup>→</sup> Fam interpreting <sup>E</sup>. It is natural to try to do this using the same technique in Fam that gives its Set-interpretation, i.e., by iterating G ω-many times starting from the initial object 0 of Fam. This gives the least fixed point μG of G as the colimit G<sup>ω</sup>0 in Fam of the sequence

$$\mathfrak{Q} \hookrightarrow G\mathfrak{Q} \hookrightarrow G^2\mathfrak{Q} \hookrightarrow \dots \hookrightarrow G^n\mathfrak{Q} \hookrightarrow \dots \tag{\*}$$

This approach is indeed viable, and is formally justified by [13]. There, it is shown that if λ is a regular cardinal, C is a locally λ-presentable category, and <sup>G</sup> : C→C is a <sup>λ</sup>-accessible functor drawn from a particular class of functors that goes far beyond just first-order polynomial ones, then the least fixed point <sup>μ</sup><sup>G</sup> of <sup>G</sup> exists in <sup>C</sup> and can be computed as the transfinite colimit <sup>G</sup><sup>λ</sup><sup>0</sup> of the sequence 0 -→ G0 -<sup>→</sup> <sup>G</sup><sup>2</sup><sup>0</sup> -→ ... -<sup>→</sup> <sup>G</sup><sup>n</sup><sup>0</sup> -→ ... -<sup>→</sup> <sup>G</sup><sup>ω</sup><sup>0</sup> -→ ... -<sup>→</sup> <sup>G</sup><sup>α</sup><sup>0</sup> -→ ... over all α<λ. That the sequence (\*) computes μG for all polynomial functors on Fam then follows by taking <sup>λ</sup> to be <sup>ω</sup>, noting that Fam is locally and recalling that all such functors are ω-accessible. That (\*) further computes μG for every functor G on Fam that interprets an expression generated by A now follows easily by structural induction. We record this as: finitely presentable,

**Theorem 1.** If G : Fam → Fam is a functor interpreting an expression (with a distinguished variable) generated by the grammar A, then the least fixed point μG of G (with respect to that variable) is G<sup>ω</sup>0. Concretely, the colimit G<sup>ω</sup>0 can be computed as lim−→<sup>n</sup>∈**<sup>N</sup>**(An, Pn)=(A, P), where <sup>A</sup> = lim−→<sup>n</sup>∈**<sup>N</sup>** A<sup>n</sup> with mediating morphisms <sup>α</sup><sup>n</sup> : <sup>A</sup><sup>n</sup> <sup>→</sup> <sup>A</sup>, and <sup>P</sup> is defined by P x = lim−→<sup>n</sup>∈**N**,y∈α−<sup>1</sup> <sup>n</sup> (x) P<sup>n</sup> y.

To define interpretations in Fam for ADTs generated by A we need the following analogue of Definition <sup>2</sup>:

**Definition 5.** A predicate environment ρ is a function from a finite subset V of <sup>V</sup> to Fam. We write EnvFam <sup>V</sup> for the set of predicate environments whose domain is <sup>V</sup> . If (A, P) <sup>∈</sup> Fam, <sup>ρ</sup> <sup>∈</sup> EnvFam <sup>V</sup> , and α -∈ V , we write ρ[α := (A, P)] for the predicate environment with domain V,α that extends ρ by mapping α to (A, P). We write αρ in place of ρ(α) for the image of α under ρ.

Let <sup>σ</sup> <sup>∈</sup> EnvSet <sup>V</sup> . If <sup>ρ</sup> <sup>∈</sup> EnvFam <sup>V</sup> is such that π1(αρ) = ασ for all α ∈ V then we say that <sup>ρ</sup> is a lifting of <sup>σ</sup>. We write <sup>σ</sup> for the particular lifting <sup>ρ</sup> of <sup>σ</sup> such that αρ = (ασ, K1) for all <sup>α</sup> <sup>∈</sup> <sup>V</sup> . In addition, if <sup>ρ</sup> <sup>∈</sup> EnvFam <sup>V</sup> maps each α ∈ V to (Aα, Pα) then we write π1ρ for the set environment with domain V mapping each α ∈ V to Aα. We write [] for the unique environment with domain V = ∅.

We then have the following Fam-interpretations for ADTs generated by <sup>A</sup>:

**Definition 6.** The interpretation function -·Fam : <sup>A</sup><sup>V</sup> <sup>→</sup> EnvFam <sup>V</sup> → Fam is:

$$\begin{array}{l} \left[\begin{matrix} 0 \\ 1 \end{matrix} \right] \stackrel{\mathsf{F}\mathsf{am}}{\mathsf{F}} \rho = \underline{0} \\ \left[\begin{matrix} 1 \end{matrix} \right] \stackrel{\mathsf{F}\mathsf{am}}{\mathsf{F}} \rho = \underline{1} \\ \left[\begin{matrix} 1 \\ 1 \end{matrix} \right] \stackrel{\mathsf{F}\mathsf{am}}{\mathsf{F}} \rho = \alpha \rho \\ \left[\begin{matrix} E\_{1} + E\_{2} \end{matrix} \right] \stackrel{\mathsf{F}\mathsf{am}}{\mathsf{F}} \rho = \left[E\_{1} \right] \stackrel{\mathsf{F}\mathsf{am}}{\mathsf{F}} \rho \neq \left[E\_{2} \right] \stackrel{\mathsf{F}\mathsf{am}}{\mathsf{F}} \rho \\ \left[\begin{matrix} E\_{1} \times E\_{2} \end{matrix} \right] \stackrel{\mathsf{F}\mathsf{am}}{\mathsf{F}} \rho = \left[E\_{1} \right] \stackrel{\mathsf{F}\mathsf{am}}{\mathsf{F}} \rho \end{array} \begin{matrix} \rho \\ \rho \end{matrix} \\ \left[\begin{matrix} \mu \alpha.E \end{matrix} \right] \stackrel{\mathsf{F}\mathsf{am}}{\mathsf{F}} \rho = \left(\begin{matrix} E \end{matrix} \right) \stackrel{\mathsf{F}\mathsf{am}}{\mathsf{F}} \rho \end{matrix}$$

Before showing how to derive induction rules for the ADTs generated by A we prove two crucial lemmas linking their Set- and Fam-interpretations.

**Lemma 1.** If <sup>E</sup> ∈ A<sup>V</sup> and <sup>ρ</sup> <sup>∈</sup> EnvFam <sup>V</sup> , then π1(-EFamρ) = -ESet(π1ρ). Furthermore, if π2(βρ) = K<sup>1</sup> for all β ∈ V , then π2(-EFamρ) = K1.

Proof. By induction on the structure of expressions. The only non-trivial case is for μα.E ∈ A<sup>V</sup> . Let <sup>ρ</sup> <sup>∈</sup> EnvFam <sup>V</sup> be given. Letting F : Set → Set be defined by F A = -<sup>E</sup>Set(π1ρ)[<sup>α</sup> := <sup>A</sup>] and <sup>G</sup> : Fam <sup>→</sup> Fam be defined by <sup>G</sup>(A, Q) = -EFamρ[α := (A, Q)], the induction hypothesis gives

$$\pi\_1(G(A,Q)) = \pi\_1(\llbracket E \rceil^{\mathsf{Fam}} \rho[\alpha := (A,Q)]) = \llbracket E \rceil^{\mathsf{Set}}(\pi\_1 \rho)[\alpha := A] = FA \qquad (\dagger)$$

and if π2(βρ) = K<sup>1</sup> for all β ∈ V then, moreover, π2(G(A, K1)) = K1. We then have π1(μα.EFamρ) = <sup>π</sup>1(μ((A, Q) -→ -EFamρ[α := (A, Q)])) = π1(μG) = <sup>π</sup>1(lim−→<sup>n</sup>∈**<sup>N</sup>** <sup>G</sup><sup>n</sup>0) = lim−→<sup>n</sup>∈**<sup>N</sup>**π1(G<sup>n</sup>0) = lim−→<sup>n</sup>∈**<sup>N</sup>** <sup>F</sup> <sup>n</sup>0 = μF <sup>=</sup> <sup>μ</sup>(<sup>A</sup> -→ -ESet(π1ρ)[α := A]) = μα.ESet(π1ρ). Here, the fourth equality is justified by Theorem 1, and the fifth is justified by (†) and induction on n. If π2(βρ) = K<sup>1</sup> for all β ∈ V as well, then π2(μα.EFamρ) = <sup>π</sup>2(μ((A, Q) -→ -EFamρ[α := (A, Q)])) = <sup>π</sup>2(μG) = <sup>π</sup>2(lim−→<sup>n</sup>∈**<sup>N</sup>** <sup>G</sup><sup>n</sup>0) = <sup>π</sup>2(lim−→<sup>n</sup>∈**<sup>N</sup>**(<sup>F</sup> <sup>n</sup>0, K1)) = λx. lim−→<sup>n</sup>∈**N**,y∈α−<sup>1</sup> <sup>n</sup> <sup>x</sup> K1y = <sup>K</sup>1. Here, the morphisms <sup>α</sup><sup>n</sup> : <sup>F</sup> <sup>n</sup><sup>0</sup> <sup>→</sup> μF are the mediating morphisms for the colimit, as in Theorem 1, and the fourth equality is justified by the fact that π2(G(A, K1)) = K<sup>1</sup> and induction on n.

**Corollary 1.** If E is closed then -EFam[] = (-ESet[], K1).

**Lemma 2.** If <sup>σ</sup> <sup>∈</sup> EnvSet <sup>V</sup> , and if F : Set → Set and G : Fam → Fam are given by F A = -ESetσ[α := A] and G(A, Q) = -EFamσ[α := (A, Q)], then μG = (μF, K1).

Proof. We have μG = μ((A, Q) -→ -EFamσ[α := (A, Q)]) = μα.EFamσ = (μα.ESetσ, K1)=(μF, K1), where the third equality holds by Lemma 1.

#### **4.3 Induction Rules for ADTs**

To derive induction rules for the ADTs generated by A, we first observe that, given an ADT μα.E ∈ A<sup>V</sup> and a set environment <sup>σ</sup> <sup>∈</sup> EnvSet <sup>V</sup> interpreting its free variables, the interpretation -ESetσ defines a functor FσA = -ESetσ[α := A] such that μα.ESet<sup>σ</sup> <sup>=</sup> <sup>μ</sup>(<sup>A</sup> -→ -<sup>E</sup>Setσ[<sup>α</sup> := <sup>A</sup>]) = <sup>μ</sup>(<sup>A</sup> -→ FσA) = μFσ. We can therefore think of F<sup>σ</sup> as representing the data type constructor associated with the ADT. Thus, as argued in [11,12], the semantic induction rule for proving predicates over the σ-instance of the ADT μα.E has the form

$$\forall (P: \mu F\_{\sigma} \to \mathbf{Set}). \; ??? \to \forall (x: \mu F\_{\sigma}). \; Px$$

for some appropriate hypotheses ???. We can use the Fam-interpretation of E to discover a semantic counterpart to the hypotheses ???. Reflecting the resulting semantic rule for the <sup>σ</sup>-instance of μα.E back into the programming language syntax will then derive induction rules for polynomial ADTs.

To deduce what ??? is, we first observe that the conclusion <sup>∀</sup>(<sup>x</sup> : μFσ).Px of the induction rule for the σ-instance of μα.E is isomorphic to the type of the second component of a morphism in Fam from (μFσ, K1) to (μFσ, P) whose first component is id. Lemma 1 suggests that if we can see (μFσ, K1) as μG for some functor <sup>G</sup> : Fam <sup>→</sup> Fam, then we can fold over a <sup>G</sup>-algebra on (μFσ, P) in Fam to get such a morphism, i.e., to inhabit the type that is the structural induction rule for the σ-instance of μα.E. This will provide a proof indμα.E,<sup>σ</sup> P that the property P holds for all elements of the σ-instance of μα.E.

To this end, let <sup>ρ</sup> <sup>∈</sup> EnvFam <sup>V</sup> be any lifting of <sup>σ</sup>, and consider again the functor Fˆρ(A, Q) = -EFamρ[α := (A, Q)] on Fam given in Lemma 1 (there called G). An <sup>F</sup>ˆ<sup>ρ</sup>-algebra structure on (μFσ, P) is a morphism (k- , k) : <sup>F</sup>ˆρ(μFσ, P) <sup>→</sup> (μFσ, P) in Fam. Then π1(Fˆρ(μFσ, P)) = π1(-EFamρ[α := (μFσ, P)]) = (π1(-EFamρ))[α := μFσ] = -ESetσ[α := μFσ] = Fσ(μFσ), with the third equality holding by Lemma 1. If we take k- = in, then k : ∀(x : Fσ(μFσ)). π2(-EFamρ[α := (μFσ, P)])x → P(in x), so that

$$\begin{array}{lcl} \operatorname{ind}\_{\mu\alpha.E,\rho} & \hfil\forall (P:\mu F\_{\sigma}\to\mathsf{Set}). \\ & & (\forall (x:F\_{\sigma}(\mu F\_{\sigma})).\pi\_2([\![E]\!]^{\mathsf{Ann}}\rho[\alpha := (\mu F\_{\sigma},P)])x \to P(in\,x)) \\ & & \to \forall (x:\mu F\_{\sigma}).Px \\ \operatorname{ind}\_{\mu\alpha.E,\rho} & P \, k \, x = \pi\_2 \left( \operatorname{fold}\_{\mu\alpha.E,\rho}^{\mathsf{Ann}}(in,k) \right) x \, () \end{array}$$

Here, fold Fam μα. E, ρ(in, k) is the unique <sup>F</sup>ˆ<sup>ρ</sup>-algebra morphism from in : <sup>F</sup>ˆρ(μFˆρ) <sup>→</sup> μFˆ<sup>ρ</sup> to (in, k) in Fam.

Taking ρ = σ in the above development derives the expected structural induction rules for ADTs generated by <sup>A</sup>. But this development is actually far more flexible, since the induction rule it derives is parameterized over an arbitrary lifting <sup>ρ</sup> of the set environment <sup>σ</sup>, and later specialized to <sup>σ</sup> to obtain structural induction rules for ADTs. The non-specialized rule can therefore be used to prove properties of ADTs that are parameterized over non-trivial (i.e., non-K1) predicates on the type parameters to the type constructors induced by those ADTs; these are precisely our deep induction rules for ADTs.

As expected, the conclusion of an ADT's deep induction rule will have an additional hypothesis involving the lifting of this predicate to that ADT. As we have seen, the ability to lift a predicate Q on a set A to a predicate T<sup>Q</sup> on T A, where T is an ADT's type constructor, is therefore central to deep induction. Every type constructor for every ADT generated by the grammar A has such a lifting. Concretely, it is computed as the second component of the interpretation in Fam of that data type. For example, the lifting List<sup>Q</sup> : List A → Set is π2μβ. 1 + <sup>α</sup> <sup>×</sup> <sup>β</sup>Fam[<sup>α</sup> := (A, Q)]. This can be coded in Agda as

$$\begin{array}{l} \mathsf{List}^{\wedge} : \forall \{\mathsf{a} : \mathsf{Set}\} \to (\mathsf{a} \to \mathsf{Set}) \to (\mathsf{List} \,\mathsf{a} \to \mathsf{Set})\\ \mathsf{List}^{\wedge} \mathsf{Q} \mathtt{Ni1} = \top \\ \mathsf{List}^{\wedge} \mathsf{Q} \left( \mathsf{Cons} \,\mathsf{x} \,\mathsf{xs} \right) = \mathsf{Q} \mathtt{x} \times \mathsf{List}^{\wedge} \mathsf{Q} \mathtt{xs} \end{array}$$

Example 1. The deep induction rule for lists can be computed as the type of indList α, ρ for the ADT List α := μβ. 1 + α × β and the predicate environment ρ = [α := (A, Q)] for (A, Q) ∈ Fam. Letting F Y = -1 + <sup>α</sup> <sup>×</sup> <sup>β</sup>Set(π1ρ)[<sup>β</sup> := <sup>Y</sup> ]=1+ <sup>A</sup><sup>×</sup> <sup>Y</sup> with the obviously named injections, we have that μF <sup>=</sup> List <sup>A</sup>. This gives the deep induction rule

$$\begin{array}{c} \left( \mathit{ind}\_{List\,\alpha,\rho} : \forall (P:\,\mu F \to \textbf{Set}). \forall (Q:A \to \textbf{Set}). \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \left( \forall (x:F(\mu F)).\,\pi\_{2}\left( \left[1+\alpha \times \beta\right] \right]^{\mathtt{Fam}}[\alpha := (A,Q),\beta := (\mu F,P) ] \right) x \to \\ \qquad \qquad \qquad \qquad P(\mathit{in}\,x)) \to \forall (x:\,\mu F). \,\mathrm{List}\_{Q} \, x \to P \, x \end{array}$$

Simplifying <sup>π</sup>2's argument gives (1, K1)+ (A, Q) <sup>×</sup> (μF, P). Its predicate part, obtained by applying π2, is K<sup>1</sup> + (Q × P), so the hypotheses for indList α,ρ are

∀(x :1+ A × List A).(K<sup>1</sup> + (Q × P))x → P(in x) = (∀(x : 1). 1 → P Nil) × (∀(y : A). ∀(ys : List A).Qy → P ys → P (Cons y ys)) = P Nil × (∀(y : A). ∀(ys : List A).Qy → P ys → P (Cons y ys))

Reflecting back into syntax gives the deep induction rule from Section 1:

$$\begin{array}{c} \forall (\mathsf{a} : \mathsf{Set}) \left( \mathsf{P} : \mathsf{List} \,\mathsf{a} \to \mathsf{Set} \right) \left( \mathsf{Q} : \mathsf{a} \to \mathsf{Set} \right) \to \\ \mathsf{P} \,\mathsf{Mi1} \to \left( \forall (\mathsf{y} : \mathsf{a}) \left( \mathsf{ys} : \mathsf{List} \,\mathsf{a} \right) \to \mathsf{Q} \,\mathsf{y} \to \mathsf{P} \,\mathsf{ys} \to \mathsf{P} \left( \mathsf{Cons} \,\mathsf{y} \,\mathsf{ys} \right) \right) \to \\ \forall (\mathsf{xs} : \mathsf{List} \,\mathsf{a}) \to \mathsf{List} \,\mathsf{"Q} \,\mathsf{x} \to \mathsf{P} \,\mathsf{x} \,\mathsf{s} \end{array}$$

Taking Q = K<sup>1</sup> gives the usual structural induction rule for lists from Section 1.

Example 2. Since Forest a and List (Forest a) are mutually recursively defined, the deep induction rule for forests is defined by mutual recursion with the deep induction rule for lists. It can be computed as the type of ind F orest α, ρ for the ADT Forest α := μβ. α × μγ. 1 + β × γ using the same technique as in Example 1. This gives the (deep) induction rule for forests from Section 1.

Example 3. Exactly the same technique delivers the deep induction rules from Section 1 for the mutually recursive ADTs Expr and BExpr whose representations are given before Definition 2.

#### **4.4 Syntax and Semantics of Nested Types**

We can use the technique from Section 4.3 to derive induction rules for nested types as well, including truly nested types and other deep nested types. To do so we first need an extension of the grammar A that generates these types.

Since nested types generalize ADTs to allow elements of a nested type at one instance of a type to depend on data at other instances of that nested type, they are interpreted as least fixed points not of ordinary (first-order) functors on Fam, as ADTs are, but rather as least fixed points of higher-order such functors. Moreover, since nested types can be parameterized over any number of type arguments, the (first-order) functors interpreting them can have correspondingly arbitrary arities. For each <sup>k</sup> <sup>≥</sup> 0 we therefore assume a countable set <sup>F</sup><sup>k</sup> of functor variables of arity k, disjoint for distinct k. We use lower case Greek letters for functor variables, write <sup>ϕ</sup><sup>k</sup> to indicate that <sup>ϕ</sup> ∈ F<sup>k</sup>, and say that <sup>ϕ</sup> has arity <sup>k</sup> in this case. Type variables are exactly functor variables of arity 0; we continue to write <sup>α</sup>, <sup>β</sup>, etc., rather than <sup>α</sup><sup>0</sup>, <sup>β</sup><sup>0</sup>, etc., for them. We write <sup>F</sup> <sup>=</sup> - <sup>k</sup>≥<sup>0</sup> <sup>F</sup><sup>k</sup>. If <sup>V</sup> ⊆ F is finite and <sup>ϕ</sup> ∈ F<sup>k</sup> for some <sup>k</sup>, write V,ϕ for <sup>V</sup> ∪ {ϕ}.

**Definition 7.** For a finite set <sup>V</sup> of <sup>F</sup>, the set of (truly) nested data types over V is generated by the following grammar:

$$\mathcal{N}^V := 0 \mid 1 \mid \varphi^k \overline{\mathcal{N}^V} \mid \mathcal{N}^V + \mathcal{N}^V \mid \mathcal{N}^V \times \mathcal{N}^V \mid (\mu \varphi^k.\lambda \alpha\_1...\alpha\_k.\mathcal{N}^{V,\alpha\_1,...,\alpha\_k,\varphi}) \overline{\mathcal{N}^V}$$

Here, <sup>ϕ</sup><sup>k</sup> <sup>∈</sup> <sup>V</sup> and the lengths of the vectors of terms in <sup>N</sup> <sup>V</sup> in the third and final clauses of the above grammar are both <sup>k</sup>.

The grammar N = - <sup>V</sup> <sup>N</sup> <sup>V</sup> generalizes <sup>A</sup> by allowing recursion not just at the level of type variables, but also at the level of functor variables. This reflects the fact that, in programming language syntax, nested types can be parameterized over both types and type constructors. For example, <sup>N</sup> <sup>V</sup> generates the representation PTree α := μϕ<sup>1</sup>.λβ.β <sup>+</sup> <sup>ϕ</sup>(<sup>β</sup> <sup>×</sup> <sup>β</sup>) <sup>α</sup> ∈ N <sup>α</sup> of the type PTree a, the representation Lam α := μϕ<sup>1</sup>.λβ.β <sup>+</sup> ϕβ <sup>×</sup> ϕβ <sup>+</sup> <sup>ϕ</sup>(β+1) <sup>α</sup> ∈ N <sup>α</sup> of the type Lam a and the representation Bush α := μϕ<sup>1</sup>.λβ. 1 + <sup>β</sup> <sup>×</sup> <sup>ϕ</sup> (ϕ β) <sup>α</sup> ∈ N <sup>α</sup> of the type Bush a. But it also generates the representation GForest ϕ α := μβ. 1+α×ϕ β <sup>∈</sup> <sup>N</sup> ϕ,α of the following nested type of generalized forests with data of type <sup>a</sup>:

$$\texttt{\texttt{\texttt{\texttt{\texttt{Fores}}}}} \texttt{\texttt{\texttt{\texttt{\\_}}}} \texttt{\texttt{\\_}} = \texttt{\\_} \texttt{\texttt{\texttt{\\_{\texttt{\\_{\\_{\texttt{\\_{\langle\}}}}}}}}} \texttt{\\_} \texttt{\\_}} \texttt{\\_} \texttt{\\_}$$

This type is higher-order in the sense that the type constructor GForest takes not just a type, but also a (unary) type constructor, as an argument. It therefore cannot be expressed as an element of <sup>A</sup>, and thus demonstrates the benefit of working with the more expressive grammar <sup>N</sup> . On the other hand, it is decidedly ADT-like, in the sense that it defines a family of inductive types rather than an inductive family of types. In fact, if f were a type constructor induced by a nested type generated by our grammar, then GForest f a and f (GForest f a) would be mutually recursively defined. In this case, generalizing Example <sup>2</sup>, their structural induction rules would also be defined by mutual recursion.

It is not hard to see that A⊆N . Moreover, the grammar <sup>N</sup> allows nested types to be parameterized over (other) nested data types, just as A allows ADTs to be parameterized over (other) ADTs. For instance, we could have perfect trees of lists or binary trees, bushes of perfect trees, etc.

We have the following notions of functor and application in Fam:

**Definition 8.** <sup>A</sup> (k-ary) lifted functor <sup>G</sup> : Fam<sup>k</sup> <sup>→</sup> Fam is a pair (F, P), where <sup>F</sup> : Set<sup>k</sup> <sup>→</sup> Set and <sup>P</sup> : <sup>∀</sup>(X1, P1)....(Xk, Pk).FX1...X<sup>k</sup> <sup>→</sup> Set is a Famindexed predicate. The application of a functor (F, P) : Fam<sup>k</sup> <sup>→</sup> Fam to an object (A1, Q1), ....,(Ak, Qk) of Fam<sup>k</sup> is given by

$$(F,P)(A\_1,Q\_1)...(A\_k,Q\_k) = (FA\_1...A\_k, P(A\_1,Q\_1)...(A\_k,Q\_k))$$

We call a lifted functor G = (F, P) a lifting of F from Set to Fam, and call P a Fam-indexed predicate. A Set-indexed predicate is a Fam-indexed predicate that does not depend on its arguments' second components. We extend the notions of set environment and predicate environment from Definitions 2 and 5 as follows:

**Definition 9.** A set environment σ is a mapping from a finite subset V = {ϕ<sup>k</sup><sup>1</sup> <sup>1</sup> , ..., ϕ<sup>k</sup><sup>n</sup> <sup>n</sup> } of <sup>F</sup> such that <sup>ϕ</sup>i<sup>σ</sup> : Set<sup>k</sup><sup>i</sup> <sup>→</sup> Set for <sup>i</sup> = 1, ..., n. We write EnvSet V for the set of set environments whose domain is <sup>V</sup> . If <sup>F</sup> <sup>∈</sup> Set<sup>k</sup> <sup>→</sup> Set, <sup>σ</sup> <sup>∈</sup> EnvSet <sup>V</sup> , and <sup>ϕ</sup><sup>k</sup> -∈ V , we write σ[ϕ := F] for the set environment with domain V,ϕ that extends σ by mapping ϕ to F. Similarly, a predicate environment ρ is a mapping from a finite subset <sup>V</sup> <sup>=</sup> {ϕ<sup>k</sup><sup>1</sup> <sup>1</sup> , ..., ϕ<sup>k</sup><sup>n</sup> <sup>n</sup> } of <sup>F</sup> such that <sup>ϕ</sup>i<sup>ρ</sup> : Fam<sup>k</sup><sup>i</sup> <sup>→</sup> Fam is a lifted functor for i = 1, ..., n. We write EnvFam <sup>V</sup> for the set of predicate environments whose domain is <sup>V</sup> . If (F, P) <sup>∈</sup> Fam<sup>k</sup> <sup>→</sup> Fam, <sup>ρ</sup> <sup>∈</sup> EnvFam <sup>V</sup> , and ϕk -∈ V , we write ρ[ϕ := (F, P)] for the predicate environment with domain V,ϕ that extends ρ by mapping ϕ to (F, P).

The notions of a predicate environment being a lifting of a set environment and the notations σ, π1ρ, and [] are now extended in the obvious ways.

The following interpretations of nested types generated by N in the locally finitely presentable categories Set and Fam are shown in [13] to be well-defined:

**Definition 10.** The interpretation functions -·Set : <sup>N</sup> <sup>V</sup> <sup>→</sup> EnvSet <sup>V</sup> → Set and -·Fam : <sup>N</sup> <sup>V</sup> <sup>→</sup> EnvFam <sup>V</sup> → Fam are:

$$\begin{array}{c} \left[\mathbb{I}\right]^{\mathsf{Set}}\sigma = 0\\ \left[\mathbb{I}\right]^{\mathsf{Set}}\sigma = 1\\ \left[\varphi^{k}E\_{1}...E\_{k}\right]^{\mathsf{Set}}\sigma = (\varphi\sigma)\overline{(\left[E\_{1}\right]^{\mathsf{Set}}\sigma)}\\ \left[E\_{1} + E\_{2}\right]^{\mathsf{Set}}\sigma = \left[E\_{1}\right]^{\mathsf{Set}}\sigma + \left[E\_{2}\right]^{\mathsf{Set}}\sigma\\ \left[E\_{1} \times E\_{2}\right]^{\mathsf{Set}}\sigma = \left[E\_{1}\right]^{\mathsf{Set}}\sigma \times \left[E\_{2}\right]^{\mathsf{Set}}\sigma\\ \left[\left(\mu\varphi^{k}.\lambda\alpha\_{1}...\alpha\_{k}.E\right)E\_{1}...E\_{k}\right]^{\mathsf{Set}}\sigma = \left(\mu\left(F \mapsto \lambda A\_{1}...A\_{k}\right.\\ \left[E\right]^{\mathsf{Set}}\sigma\left[\alpha\_{i}:=A\_{i}\right]\left[\varphi:=F\right]\right)\rangle\overline{(\left[E\_{i}\right]^{\mathsf{Set}}\sigma)} \end{array}$$

$$\begin{array}{c} \begin{array}{l} \left[\text{I}\right]^{\mathsf{F}\mathsf{am}}\rho = \emptyset\\ \left[\text{I}\right]^{\mathsf{F}\mathsf{am}}\rho = \mathtt{I} \end{array}\\ \left[\varphi^{k}E\_{1}\ldots E\_{k}\right]^{\mathsf{F}\mathsf{am}}\rho = (\varphi\rho)(\overline{\left[E\_{1}\right]}\overline{\left[\left.E\_{1}\right]}^{\mathsf{F}\mathsf{am}}\rho)\\ \left[\left.E\_{1} + E\_{2}\right]^{\mathsf{F}\mathsf{am}}\rho = \left[\left.E\_{1}\right]^{\mathsf{F}\mathsf{am}}\rho + \left[\left.E\_{2}\right]\right]^{\mathsf{F}\mathsf{am}}\rho\\ \left[\left.E\_{1} \times E\_{2}\right]\right]^{\mathsf{F}\mathsf{am}}\rho = \left[\left.E\_{1}\right]^{\mathsf{F}\mathsf{am}}\rho \times \left[\left.E\_{2}\right]\right]^{\mathsf{F}\mathsf{am}}\rho\\ \left[\left(\mu\varphi^{k}.\lambda\alpha\_{1}...\alpha\_{k}.\,E\right)E\_{1}...E\_{k}\right]^{\mathsf{F}\mathsf{am}}\rho = \left(\mu\left(F\mapsto \lambda Z\_{1}...Z\_{k}\right.\right.\\ \left[\left.\left[E\right]\right]^{\mathsf{F}\mathsf{am}}\rho\overline{\left[\alpha\_{i}:=Z\_{i}\right]}\right]\left(\varphi:=F\right)\right)\overline{\left(\left[\left.E\_{i}\right]\right]^{\mathsf{F}\mathsf{am}}\rho} \end{array} \end{array}$$

#### **4.5 Induction Rules for Nested Types**

Straightforward generalization of the analysis in Section 4.3 to N gives induction rules for the type constructors nested types induce. Given a nested type (μϕ<sup>k</sup>.λα1...αk. E)E1...E<sup>k</sup> ∈ N <sup>V</sup> with type constructor <sup>T</sup> <sup>=</sup> μϕ<sup>k</sup>.λα1...αk. E and a set environment <sup>σ</sup> <sup>∈</sup> EnvSet <sup>V</sup> interpreting its free variables, we have that


where the higher-order functor H<sup>σ</sup> on Set is defined by

$$H\_{\sigma}F A\_1...A\_k = [E]^{\text{Set}} \sigma \overline{[\alpha\_i := A\_i]} [\varphi := F]$$

For any lifting <sup>ρ</sup> of <sup>σ</sup>, the predicate counterpart to <sup>H</sup><sup>σ</sup> is the higher-order functor Hˆ<sup>ρ</sup> on Fam whose action on a k-ary lifted functor (F, P) is the k-ary lifted functor Hˆρ(F, P) given by

$$\hat{H}\_{\rho} \left( F, P \right) (A\_1, Q\_1) ... (A\_k, Q\_k) \ = \left[ \left[ E \right] \right]^{\mathsf{Fam}} \rho \overline{\left[ \alpha := (A\_i, Q\_i) \right]} \left[ \varphi := (F, P) \right]$$

The induction rule indT, ρ for proving predicates over the σ-instance of the type constructor <sup>T</sup> relative to the lifting <sup>ρ</sup> is thus given by

$$\begin{array}{lcl} ind\_{T,\rho}: & \forall (P:\forall(\overline{A\_{i},Q\_{i}}).\langle\mu H\_{\sigma}\rangle\,\overline{A\_{i}}\to\mathsf{Set}).\\ & \quad \langle\forall(\overline{A\_{i},Q\_{i}}).\pi\_{2}(\hat{H}\_{\rho}\langle\mu H\_{\sigma},P\rangle)(\overline{A\_{i},Q\_{i}})\to P(\overline{A\_{i},Q\_{i}})\rangle\rightarrow\\ & \quad \langle\forall(\overline{A\_{i},Q\_{i}}).\pi\_{2}(\mu\hat{H}\_{\rho})(\overline{A\_{i},Q\_{i}})\to P(\overline{A\_{i},Q\_{i}})\rangle\\ &=\forall(P:\forall(\overline{A\_{i},Q\_{i}}).(\mu H\_{\sigma})\overline{A\_{i}}\to\mathsf{Set}).\\ & \quad \langle\forall(\overline{A\_{i},Q\_{i}}).\forall(x:H\_{\sigma}(\mu\mu H\_{\sigma})\overline{A\_{i}}\rangle.\\ & \quad \pi\_{2}(\hat{H}\_{\rho}(\mu\mu H\_{\sigma},P))(\overline{A\_{i},Q\_{i}})x\to P(\overline{A\_{i},Q\_{i}})(\bar{x}.x)\rangle\rightarrow\\ & \quad \langle\forall(\overline{A\_{i},Q\_{i}}).\forall(x:(\mu H\_{\sigma})\overline{A\_{i}}).\pi\_{2}(\mu\hat{H}\_{\rho})(\overline{A\_{i},Q\_{i}})x\to P(\overline{A\_{i},Q\_{i}})x\rangle\\ & \quad \qquad \pi\_{1}P\,k\left(\overline{A\_{i},Q\_{i}}\right).\pi\_{2}(\text{fold}^{\mathrm{Fan}}\_{T,\rho}(\bar{m},k))\rangle\end{array}$$

To get analogues for nested types of the structural induction rules for ADTs note that, since each σ-instance of the type constructor T = μϕ<sup>k</sup>.λα1...αk. E associated with a nested type (μϕ<sup>k</sup>.λα1...αk.E)E1...E<sup>k</sup> ∈ N <sup>V</sup> gives rise to an inductive family of types, the appropriate notion of predicate for a nested type is actually a Set-indexed predicate. By direct analogy with structural induction for ADTs, the structural induction rule for a nested type with type constructor T whose σ-instance is interpreted by μH<sup>σ</sup> is then

$$\begin{array}{c} \forall (P: \overline{\forall A\_{i}}. (\mu H\_{\sigma}) \overline{A\_{i}} \to \mathsf{Set}). \\ \quad (\forall \overline{A\_{i}}. \forall (x: H\_{\sigma}(\mu H\_{\sigma}) \overline{A\_{i}}). \pi\_{2}(\hat{H}\_{\sigma}(\mu H\_{\sigma}, \hat{P})) \overline{(A\_{i}, K\_{1})} \, x \to \hat{P}(\overline{A\_{i}, K\_{1}})(\dot{n} \, x)) \to \\ \qquad (\forall \overline{A\_{i}}. \forall (x: (\mu H\_{\sigma}) \overline{A\_{i}}). \pi\_{2}(\mu \hat{H}\_{\sigma}) \overline{(A\_{i}, K\_{1})} \, x \to \hat{P}(\overline{A\_{i}, K\_{1}}) x) \end{array}$$
  $\exists \neg \forall (P: \forall \overline{A\_{i}}. (\mu H\_{\sigma}) \overline{A\_{i}} \to \mathsf{Set}).$  
$$\begin{array}{c} (\forall \overline{A\_{i}}. \forall (x: H\_{\sigma}(\mu H\_{\sigma}) \overline{A\_{i}}). \pi\_{2}(\hat{H}\_{\sigma}(\mu H\_{\sigma}, \hat{P})) \overline{(A\_{i}, K\_{1})} \, x \to \hat{P}(\overline{A\_{i}, K\_{1}})(\dot{n} \, x)) \to \\ (\forall \overline{A\_{i}}. \forall (x: (\mu H\_{\sigma}) \overline{A\_{i}}). \overline{P.A\_{i}} x) \end{array}$$

where <sup>P</sup><sup>ˆ</sup> is defined below. To see that the structural induction rule (‡) is indeed a specialization of indT, ρ, suppose we are given a predicate P : ∀(Ai, Qi).(μHσ)A<sup>i</sup> → Set for a nested type with type constructor T whose σ-instance is interpreted by μHσ, together with induction hypotheses

$$R = \forall \overline{A\_i}. \forall (x:H\_\sigma(\mu H\_\sigma) \overline{A\_i}). \pi\_2(\hat{H}\_{\overline{\sigma}}(\mu H\_\sigma, \hat{P})) \overline{(A\_i, K\_1)} \, x \to \hat{P}(\overline{A\_i, K\_1})(\ln x)^i$$

Let <sup>P</sup><sup>ˆ</sup> : <sup>∀</sup>(Ai, Qi).(μHσ)A<sup>i</sup> <sup>→</sup> Set be the Fam-indexed predicate <sup>P</sup><sup>ˆ</sup> <sup>=</sup> <sup>λ</sup>(Ai, Qi). PAi, and consider the instantiation indT, <sup>σ</sup> Pˆ Rˆ, where the induction hypothesis <sup>R</sup><sup>ˆ</sup> : <sup>∀</sup>(Ai, Qi). <sup>∀</sup>(<sup>x</sup> : <sup>H</sup>σ(μHσ)Ai). π2(Hˆσ(μHσ, <sup>P</sup>ˆ))(Ai, Qi)<sup>x</sup> <sup>→</sup> <sup>P</sup>ˆ(Ai, Qi)(in <sup>x</sup>) for indT, <sup>σ</sup> is given by Rˆ (Ai, Qi) x y = R A<sup>i</sup> x (π2(Hˆσ(μHσ, Pˆ)t) x y).

## **5 The General Methodology**

We can distill from the foundations given in Section 4 a general methodology that will derive correct deep induction rules for any nested type generated by N . Concretely, this methodology comprises the following steps:


These are precisely the steps carried out in all of our examples, including those below, which illustrate the derivation for nested types given in Section 4.5.

Example 4. Since the nested type Lam α := μϕ<sup>1</sup>.λβ.β <sup>+</sup> ϕβ <sup>×</sup> ϕβ <sup>+</sup> <sup>ϕ</sup>(β+1) α of lambda terms is uniform in its index α, it induces a type constructor Lam := μϕ<sup>1</sup>.λβ.β <sup>+</sup> ϕβ <sup>×</sup> ϕβ <sup>+</sup> <sup>ϕ</sup>(β+1). Writing <sup>H</sup> for <sup>H</sup>[] and <sup>H</sup><sup>ˆ</sup> for <sup>H</sup>ˆ[], and letting

$$HFA = \left[\beta + \varphi\beta \times \varphi\beta + \varphi(\beta + 1)\right] \mathsf{Set}[\beta := A][\varphi := F] \ = \ A + FA \times FA + F(A + 1)$$

we have that μH = Lam and that the predicate counterpart Hˆ to H is given by

$$
\begin{array}{c}
\hat{H}\left(F,\hat{P}\right)(A,Q) = [\beta + \varphi\beta \times \varphi\beta + \varphi(\beta + 1)]^{\mathsf{Fam}}[\beta := (A,Q)][\varphi := (F,\hat{P})] \\
= (A,Q) \pm (F,\hat{P})(A,Q) \times (F,\hat{P})(A,Q) \pm (F,\hat{P})((A,Q) \pm (1,K\_1)) \\
= (A + FA \times FA + F(A + 1), \\
\pi\_2((A,Q) \pm (F,\hat{P})(A,Q) \pm (F,\hat{P})(A,Q) \pm (F,\hat{P})((A,Q) \pm (1,K\_1)))
\end{array}
$$

Reflecting μHˆ back into syntax gives the inductive predicate

$$\begin{array}{c} \mathsf{Lan}^{\wedge}: \forall (\mathsf{a} : \mathsf{Set}) \to (\mathsf{a} \to \mathsf{Set}) \to (\mathsf{Lan}\,\mathsf{a} \to \mathsf{Set}) \,\,\mathsf{where} \\ \mathsf{Var}^{\wedge}: \forall (\mathsf{a} : \mathsf{Set}) \left(\mathsf{Q} : \mathsf{a} \to \mathsf{Set}\right) \left(\mathsf{x} : \mathsf{a}\right) \to \mathsf{Q} \,\mathsf{x} \to \mathsf{Lan}^{\wedge} \,\mathsf{a} \,\mathsf{Q} \left(\mathsf{Var}\,\mathsf{x}\right) \\ \mathsf{Ap}\mathsf{p}^{\wedge}: \forall (\mathsf{a} : \mathsf{Set}) \left(\mathsf{Q} : \mathsf{a} \to \mathsf{Set}\right) \left(\mathsf{x} : \mathsf{Lan}\,\mathsf{a}\right) \left(\mathsf{y} : \mathsf{Lan}\,\mathsf{a}\right) \to \mathsf{Lan}^{\wedge} \,\mathsf{a} \,\mathsf{Q} \,\mathsf{x} \to \\ \qquad \qquad \qquad \qquad \mathsf{Lan}^{\wedge} \,\mathsf{a} \,\mathsf{Q} \,\mathsf{y} \to \mathsf{Lan}^{\wedge} \,\mathsf{a} \,\mathsf{Q} \left(\mathsf{Ap}\,\mathsf{p} \,\mathsf{x} \,\mathsf{y}\right) \\ \mathsf{Abs}^{\wedge}: \forall (\mathsf{a} : \mathsf{Set}) \left(\mathsf{Q} : \mathsf{a} \to \mathsf{Set}\right) \left(\mathsf{x} : \mathsf{Lan}\,\mathsf{a}\right) \to \mathsf{Lan}^{\wedge} \,\left(\mathsf{Maybe}\,\mathsf{a}\right) \left(\mathsf{Maybe}\,\mathsf{e}^{\wedge} \,\mathsf{a} \,\mathsf{Q}\right) \,\mathsf{x} \to \mathsf{x} \,\mathsf{x} \,\mathsf{x} \,\mathsf{x} \,\mathsf{x} \,\mathsf{x} \,\mathsf{$$

Now, if <sup>P</sup> is any other predicate on Lam admitting an <sup>H</sup><sup>ˆ</sup> -algebra structure, then there must exist a morphism k : ∀(x : A+ Lam A× Lam A+ Lam(A+1)).(Q + PAQ×PAQ+P(A+1)((+1)<sup>∧</sup> Q))x → PAQ (in x), i.e., k = (k1, k2, k3), where

k<sup>1</sup> : ∀(x : A).Qx → P AQ (V ar x) k<sup>2</sup> : ∀(x : Lam A). ∀(y : Lam A).P AQx → P AQy → P AQ (App x y) k<sup>3</sup> : ∀(x : Lam (A + 1)). P (A + 1)((+1)<sup>∧</sup> Q) x → P AQ (Abs x)

Since Lam<sup>∧</sup> reflects the initial <sup>H</sup><sup>ˆ</sup> -algebra, there is a unique algebra morphism from in : <sup>H</sup><sup>ˆ</sup> (μH<sup>ˆ</sup> ) <sup>→</sup> <sup>μ</sup>H<sup>ˆ</sup> to the <sup>H</sup><sup>ˆ</sup> -algebra <sup>k</sup> on <sup>P</sup>, i.e., from <sup>μ</sup>H<sup>ˆ</sup> to <sup>P</sup>. Reflecting this morphism back into syntax gives the deep induction rule for lambda terms from Section 3.

The deep induction rule for lambda terms can be used to prove, e.g., properties of lambda terms whose variables are represented by prime numbers or lambda terms over strings that can represent variable names. It can also be used to prove properties of lambda terms over lambda terms, such as the associativity laws needed to show that the functor Lam is a monad; such a proof is included as the first case study in the accompanying Agda code. The second uses deep induction rule we derive in Example 5 to prove some results about bushes.

Since truly nested types are a special case of deep nested types, our methodology can derive useful induction rules for them — including the perpetually problematic truly nested type of bushes [8,10,15] introduced in Section 3.

Example 5. Since the truly nested type Bush α := μϕ<sup>1</sup>.λβ. 1 + <sup>β</sup> <sup>×</sup> <sup>ϕ</sup> (ϕ β) α ∈ <sup>N</sup> <sup>α</sup> is uniform in its index <sup>α</sup>, it induces a type constructor Bush := μϕ<sup>1</sup>.λβ. 1 + <sup>β</sup> <sup>×</sup> <sup>ϕ</sup> (ϕ β). Writing <sup>H</sup> for <sup>H</sup>[] and <sup>H</sup><sup>ˆ</sup> for <sup>H</sup>ˆ[], and letting

$$HFA \;= \; \lbrack 1 + \beta \times \varphi \left( \varphi \beta \right) \rbrack^{\mathsf{Set}}\\\sigma[\beta := A][\varphi := F] \;= \; 1 + A \times F(FA)$$

we have that μH = Bush and the predicate counterpart Hˆ to H is given by

$$\begin{array}{c} \hat{H}\left(F,P\right)(A,Q) = \left[1+\beta \times \varphi\left(\varphi\,\beta\right)\right]^{\mathsf{Fam}} \overline{\sigma}[\beta := (A,Q)][\varphi := (F,P)]\\ = (1,K\_1) \pm (A,Q) \pm (F,P)((F,P)(A,Q))\\ = (1+A \times F(FA), K\_1+Q \times \pi\_2((F,P)((F,P)(A,Q)))) \end{array}$$

Reflecting μHˆ back into syntax gives the inductive predicate

$$\begin{array}{l} \mathsf{Bush}^{\wedge}: \forall (\mathsf{a} : \mathsf{Set}) \to (\mathsf{a} \to \mathsf{Set}) \to (\mathsf{Bush} \,\mathsf{a} \to \mathsf{Set}) \,\,\mathsf{where}\\ \mathsf{Bild} \,\mathsf{1}^{\wedge}: \forall (\mathsf{a} : \mathsf{Set}) \,(\mathsf{Q} : \mathsf{a} \to \mathsf{Set}) \to \mathsf{Bush}^{\wedge} \,\mathsf{a} \,\mathsf{B} \mathtt{Wil} \,\mathsf{1} \\\ \mathsf{Bcons}^{\wedge}: \forall (\mathsf{a} : \mathsf{Set}) \,(\mathsf{Q} : \mathsf{a} \to \mathsf{Set}) \,(\mathsf{x} : \mathsf{a}) \,(\mathsf{y} : \mathsf{Bush} \,(\mathsf{Bush} \,\mathsf{a})) \to \\ \mathsf{Q} \,\mathsf{x} \to \mathsf{Bush}^{\wedge} \,(\mathsf{Bush} \,\mathsf{a}) \,(\mathsf{Bush}^{\wedge} \,\mathsf{Q}) \,\mathsf{x} \to \mathsf{Bush}^{\wedge} \,\mathsf{a} \,\mathsf{Q} \,(\mathsf{Bcons} \,\mathsf{x} \,\mathsf{y}) \,\,\mathsf{x} \,\mathsf{y} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\mathsf{q} \,\,\,\mathsf{q} \,\,\,\mathsf{q} \,\,\,\mathsf{q} \,\,\,\mathsf{q} \,\,\,\mathsf{q} \,\,\,\,\mathsf{q} \,\,\,\,\,\mathsf{q} \,\,\,\,\,\,\mathsf{q} \,\,\,\,$$

Now, if <sup>P</sup> is any other predicate on Bush admitting an <sup>H</sup><sup>ˆ</sup> -algebra structure, then there must exist a morphism

$$\begin{array}{ll} k: & \forall (x:1 + Bush\,(Bush\,A)). \\ & (K\_1 + Q \times \pi\_2((Bush, \hat{P})((Bush, \hat{P})(A, Q))))x \to PAQ\,(\dot{m}\,x) \\ & = \forall (x:1 + Bush\,(Bush\,A)). \,(K\_1 + Q \times P\,(Bush\,A)\,(PAQ))x \to PAQ\,(\dot{m}\,x) \end{array}$$

i.e., (k1, k2), where k<sup>1</sup> : ∀(x : 1). 1 → P AQ BNil and k<sup>2</sup> : ∀(x : A). ∀(y : Bush (Bush A)). 1 → P (Bush A)(PAQ) y → PAQ(BCons x y). Since Bush<sup>∧</sup> reflects the initial Hˆ -algebra, there is a unique predicate morphism from μHˆ to P. Reflecting this morphism back into syntax gives the deep induction rule for bushes from Section 3.

The function BDind⇒MBDind in our Agda code shows that our methodology also derives a mutually recursive deep induction rule for bushes, there called MBDind.

Examples 4 and 5 show that when the definition of a nested type contains an instance of another nested type constructor C — e.g., Maybe a in the argument Lam (Maybe a) to Abs — its inductive predicate definition, and thus its deep induction rule, will involve a call to the predicate interpretation C<sup>∧</sup> of C. When the definition contains an instance of the constructor for the same type being defined — e.g., Bush a in the type argument Bush (Bush a) to BCons — its inductive predicate definition, and thus its deep induction rule, will involve a recursive call to the inductive predicate being defined. The treatment of a truly nested type is thus exactly the same as the treatment of any other nested type.

Independently of deriving induction rules, even defining some nested types in Agda requires turning off its termination checks in a few tightly compartmentalized places. For example, neither Coq nor Agda currently allows the definition of the bush data type because of the non-positive occurrence of Bush in the type of BCons. The correctness of our development in those places is justified by [13]. This work suggests that the current notion of positivity should be generalized.

## **6 Related Work and Directions for Further Investigation**

As far as we know, the phenomenon of deep induction has not previously even been identified, let alone studied. This paper treats deep induction for nested types, which extend ADTs by allowing higher-order recursion. Other generalizations of ADTs are also well-studied in the literature, including (indexed) containers [1,2], which extend ADTs by allowing type dependency. In particular, [3] defines a class of "nested" containers corresponding to inductive types whose constructors can recursively depend on the data type at different instances than the one being defined. The case of truly nested types is not treated there, however. We hope eventually to extend the results of this paper to derive provably correct deep induction rules for (indexed) containers, GADTs, dependent types, and other classes of more advanced data types. One interesting question is whether or not a common generalization of indexed containers and the class of nested types studied here has a rigorous initial algebra semantics as in [13].

A more recent line of investigation concerns sized types [5]. These are particularly well-suited to termination checking of (co)recursive definitions, and are implemented in the latest versions of Agda [6]. Although originally defined in the context of a type theory with higher-order functions [4], the current incarnation of sized types does not appear to admit definitions with true nesting. What seems to be missing is an addition operation on sizes, which would allow a constructor such as BCons to combine a structure with size of depth "up to α" with one of depth "up to β" to define a data element of depth "up to α + β".

Tassi [17] has independently implemented a tool for deriving induction principles of data type definitions in Coq using unary parametricity. Although neither rigorous derivation nor justification is provided, his technique seems to be essentially equivalent to ours, and could perhaps be justified by our general framework. True nesting still is not permitted, however. In [7], mutually recursively defined induction and coinduction rules are derived for mutually recursive and corecursive data types. But these are still the standard structural (co)induction rules, rather than deep ones. This suggests a need for deep coinduction rules, too.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Exponential Automatic Amortized Resource Analysis***-*

David M. Kahn [-] and Jan Hoffmann

Carnegie Mellon University, Pittsburgh PA, USA davidkah@cs.cmu.edu cs.cmu.edu/∼davidkah jhoffmann@cmu.edu cs.cmu.edu/∼janh

**Abstract.** Automatic amortized resource analysis (AARA) is a typebased technique for inferring concrete (non-asymptotic) bounds on a program's resource usage. Existing work on AARA has focused on bounds that are polynomial in the sizes of the inputs. This paper presents and extension of AARA to exponential bounds that preserves the benefits of the technique, such as compositionality and efficient type inference based on linear constraint solving. A key idea is the use of the Stirling numbers of the second kind as the basis of potential functions, which play the same role as the binomial coefficients in polynomial AARA. To formalize the similarities with the existing analyses, the paper presents a general methodology for AARA that is instantiated to the polynomial version, the exponential version, and a combined system with potential functions that are formed by products of Stirling numbers and binomial coefficients. The soundness of exponential AARA is proved with respect to an operational cost semantics and the analysis of representative example programs demonstrates the effectiveness of the new analysis.

**Keywords:** Functional programming · Resource consumption · Quantitative analysis · Amortized analysis · Stirling numbers · Exponential

## **1 Introduction**

"Time is money" is a phrase that also applies to executing software, most directly in domains such as on-demand cloud computing and smart contracts where execution comes with a explicit price tag. In such domains, there is an increasing interest in formally analyzing and certifying the precise resource usage of programs. However, the cost of formally verifying properties by hand is an obstacle to even getting projects off the ground. For this reason, it would be desirable if such resource analyses could be performed mostly automatically, with reduced burden on the programmer.

<sup>-</sup> This article is based on research supported by DARPA under AA Contract FA8750- 18-C-0092 and by the National Science Foundation under SaTC Award 1801369, SHF Award 1812876, and CAREER Award 1845514. Any opinions, findings, and conclusions contained in this document are those of the authors and do not necessarily reflect the views of the sponsoring organizations.

Techniques and tools for automatic and semi-automatic resource analysis have been extensively studied. The applied methods range from deriving and analyzing recurrence relations [55, 1, 16, 2, 12, 36, 10, 37], to abstract interpretation and static analysis [18, 7, 49, 39], to type systems [11, 56, 53], to proof assistants and program logics [4, 9, 8, 48, 19, 45, 42], to term rewriting [6, 5, 47]. Many techniques focus on upper bounds on the worst-case bounds, but average-case bounds [15, 35, 43, 54] and lower-bounds have also been studied [3, 17, 44].

In this paper, we extend automatic amortized resource analysis (AARA) to cover *exponential* worst-case bounds. AARA is an effective type-based technique for deriving concrete (non-asymptotic) worst-case bounds, in particular for functional languages. It has been introduced by Hofmann and Jost [31] to derive *linear bounds* on the heap-space usage of strict first-order functional programs with lists. Subsequently, AARA has been extended to programs with recursive types and general resource metrics [34], higher-order functions [33], lazy evaluation [52], parallel evaluation [29], univariate polynomial bounds [27], multivariate polynomial bounds [23, 25], session-typed concurrency [13], and side effects [38, 46]. However, none of the aforementioned works explores exponential bounds.

The idea of AARA is to enrich types with numeric annotations that represent coefficients in a potential function in the sense of amortized analysis [51]. Bound inference is reduced to Hindley-Milner type inference extended with linear constraints for the numeric annotations. Advantages of the technique include compositionality, efficient bound inference via off-the-shelf LP solving, and the ability to derive bounds on the high-water mark for non-monotone resources like memory. A powerful innovation leveraged in polynomial AARA is the representation of potential functions as non-negative linear combinations of binomial coefficients. Their combinatorial identities yield simple and local typing rules and support a natural semantic understanding of types and bounds. Moreover, these potential functions are more expressive than non-negative linear-combinations of the standard polynomial basis.

However, polynomial potential is not always enough. Functional languages make it particularly easy to use exponentially many resources just by having two or more recursive calls. The following function subsetSum : int list → int → bool exemplifies this by naively solving the well-known NP-complete problem subset sum. In the worst case, it performs 3 <sup>∗</sup> <sup>2</sup>|nums<sup>|</sup> <sup>−</sup>2 Boolean and arithmetic operations (where |x| gives the length of the list x).

```
let subsetSum nums target =
   match nums with
   | [] → target = 0
   | hd::tl → subsetSum tl (target-hd) || subsetSum tl target
```
Such a function could appear in a program with polynomial resource usage if applied to arguments of logarithmic size. In this case, polynomial AARA would not be able to derive a bound. Section 6 contains a relevant example.

To handle such functions, we introduce an extension to AARA that allows working with potential functions of the form f(n) = b<sup>n</sup>. This extension exploits the combinatorial properties of *Stirling numbers of the second kind* [50] in much the same way that AARA currently exploits those of binomial coefficients. Moreover, we allow both multiplicative and additive mixtures of exponential and polynomial potential functions. The techniques used in this process could easily be applied to other potential functions in the future.

The paper first details a generalized AARA type system fit for reuse between polynomial, exponential, and other potential functions. We then instantiate this system with Stirling numbers of the second kind, yielding the first AARA that can infer exponential resource bounds. Finally, we pick out the characteristics that allow for mixing different families of potential functions and maximizing the space they express, and we instantiate the general system with products of exponential and polynomial potential functions. To focus on the main contribution, we develop the system for a simple first-order language with lists in which resource usage is defined with explicit *tick* expressions. However, we are confident that the results smoothly generalize to more general resource metrics, recursive types, and higher-order functions. As in previous work, we prove the soundness of the analysis with respect to a big-step cost semantics that models the high-water mark of the resource usage.

## **2 Language and Cost Semantics**

*Abstract Syntax* To begin, we define an abstract binding tree (ABT, see [20]) underlying a simple strict first-order functional language. Expressions are in letnormal form to simplify the AARA typing rules. For code examples, however, we overlay the ABT with corresponding ML-based syntax. For example, 1::[], [1], and *cons*(1, *nil*) all represent the same list.

A program *prog* is a collection of functions as defined in the following grammar. The symbols *lit*, *binop*, and *unop* refer to standard literal values, binary operations, and unary operations respectively, of *b*asic types (*int*, *bool*, etc.). The symbols f, x, and r refer to function identifiers, variables, and rational numbers, respectively.

$$\begin{aligned} \operatorname{prog} &::= \operatorname{func}\{f\}(x.e)\operatorname{prog} \mid \epsilon \\ \epsilon &::= \operatorname{list}\mid x \mid \operatorname{binop}(x\_1; x\_2) \mid \operatorname{unop}(x) \mid \operatorname{app}\{f\}(x) \mid \operatorname{let}(e\_1; x.e\_2) \\ &\mid \operatorname{share}(x\_1; x\_2, x\_3.e) \mid \operatorname{tick}\{r\} \mid \operatorname{pair}(x\_1; x\_2) \mid \operatorname{nil} \mid \operatorname{cons}(x\_1; x\_2) \\ &\mid \operatorname{cond}(x; e\_1; e\_2) \mid \operatorname{pair}\operatorname{Mat}(x\_1; x\_2, x\_3.e) \mid \operatorname{list}\operatorname{Mat}(x\_1; e\_1; x\_2, x\_3.e\_2) \end{aligned}$$

Expressions include function applications, conditionals, and the usual introduction and elimination forms for pairs and lists. They also include two special expressions: *tick*{r} and *share*. The former, *tick*{r}, is used to specify constant resource cost r. We allow r to be negative in the case of resources becoming available instead of being consumed. The latter, *share*(x1; x2, x3.e), provides two copies of its argument x<sup>1</sup> for use in e. This is useful because the affine features of the AARA type system do not allow naive variable reuse. In practice, *share* can be left implicit by automatically preceding every variable usage by *share*.

To focus on the technical novelties, we keep function identifiers and variables disjoint, that is, the types of variables do not contain arrow types and functions are first-order. Higher-order functions can be handled as in previous AARA literature [25]. As a further simplification, we only let functions take one argument; multiple arguments can be simulated with nested pairs. Finally, the language here only supports the inductive types of lists; future work could extend this to more general types as in other AARA literature [38, 25, 30, 28].

*Operational Cost Semantics* To define resource usage, AARA literature uses the operational big-step judgment V e ⇓ v | (q, q- ) (see e.g. [22]) defined in Figure 1. This judgment means that, under the environment V , the expression e evaluates to the value v under some resource constraints given by the pair q, q- . The environment V maps variables to values. The resource constraints are that q is the high-water mark of resource usage, and q − q is the net amount of resources consumed during evaluation. In other words, if one started with exactly as many resources needed to evaluate e, that amount would be q, and the amount of leftover resources after evaluation would be q- . It is essential to track both of these values to model resources that might be returned after use, like space. Space usage usually has a positive high-water mark but no net resource consumption, as space could be reused.

The above big-step judgment only formalizes terminating evaluations. To deal with divergence, the additional judgment V e ⇓◦| q has been introduced [26]. This merely drops the parts of the previous judgment relevant to post-termination, focusing on partial evaluation. It means that some partial evaluation of e uses a high-water mark of q resources. Should it exist, the largest q such that V e ⇓◦| q holds would be the high-water mark of resource usage across any partial evaluation of e. For a formal definition, see [26].

## **3 Automatic Amortized Resource Analysis**

Here we lay out a generalized version of the AARA system with the potential functions abstracted. Existing AARA literature is specialized to polynomial functions (see e.g. [27]). This existing polynomial system may be obtained as an instantiation, as may the exponential system that we introduce in Section 4.

AARA uses the *potential* (or physicist's) method to account for resource use, as is commonly used in amortized analyses. The potential method uses the physical analogy of converting between potential and actual energy that can be used to perform work. Whereas a physicist might find potential in the chemical bonds of a fuel, however, AARA places it in the constructors of lists.

To prime intuition with an example, consider paying a resource for each :: operation performed in the following code. It performs *s*noc, which is like *c*ons but adds onto the back of the list rather than the front.


The resource consumption of *snoc* x xs as defined by the *t*ick expressions is 1 + |xs|. Using the potential method, we can justify this bound as follows. If 1 resource is initially available, then the base case of the empty list can be paid for. If there is 1 stored per element of the list then 1 resource is released in the cons case of the pattern match. This suffices to pay for the additional ::

#### **Fig. 1.** Terminating operational cost semantics rules.

$$\begin{array}{llll} q = \max(r, 0) & q' = \max(r-t, 0) & & & \operatorname{binop}(V(x\_1), V(x\_2)) \to v \\ \hline V \vdash \operatorname{lst} \{x\} \Downarrow \{0, q'\} & & & V \vdash \operatorname{lst} \{(0, x\_1) \ne \neg v\} \Downarrow \{(0, 0) \} \\ \hline \hline V \vdash \operatorname{lst} \{\operatorname{lst} \{0, 0\} \} & & V \vdash x \Downarrow \{0, 0\} \Downarrow \operatorname{lst} \qquad \frac{V \vdash x \Downarrow \{x\_1\} \Downarrow \{x\_2\}}{V \vdash \operatorname{lst} \{(x\_1, x\_2) \ne \neg v\} \Downarrow \{(0, q')\}} \; \mathit{Pair} \\ \hline \frac{\operatorname{ump}(V(x)) \to v}{V \vdash \operatorname{lst} \{(x\_1, x\_2) \ne \neg v\} \Downarrow \{0\} \Downarrow} & \frac{V \left(x\_1\right) = \forall v, x\_2 \} \quad \frac{V \left(x\_1 \pm v, x\_2 \pm \neg v\right) \Downarrow \{x\_1\}}{V \vdash \operatorname{lst} \{(x\_1, x\_2, x\_2, x\_2, x\_2) \ne \neg v\} \Downarrow \{0\} \Downarrow} \; \mathit{Pam} \\ \hline \frac{V \vdash e\_1 \Downarrow \{1\} \Downarrow \{q, q'\} \quad V \left[x\_1 \pm v\right] \Downarrow \{2\} \Downarrow \{q, p'\} \; & & P \lor \operatorname{lst} \{(p, q')\} \Downarrow \{0\} \Downarrow \{0\} \Downarrow \{0\} \Downarrow \{0\} \Downarrow \{0\} \Downarrow \{0\} \Downarrow \{0\} \Downarrow \{0\} \Downarrow \{0\} \Downarrow \{0\} \Downarrow \{0\}$$

operation. The remaining potential on xs can be assigned to tl for the recursive call. One can sum these costs to infer that the initial potential 1 + |*xs*| covers the cost of all the :: operations. The AARA type system could describe this with the typing L<sup>1</sup>(Z) for *x* s (describing the linear potential in the superscript) and <sup>Z</sup> <sup>×</sup> <sup>L</sup><sup>1</sup>(Z) <sup>1</sup>/<sup>0</sup> <sup>→</sup> <sup>L</sup><sup>0</sup>(Z) for *<sup>s</sup>*noc (describing the initial/remaining resources above the arrow). Another valid type is <sup>Z</sup> <sup>×</sup> <sup>L</sup><sup>2</sup>(Z) <sup>1</sup>/<sup>0</sup> <sup>→</sup> <sup>L</sup><sup>1</sup>(Z), which could be used in a context where the result of *s*noc must be used to pay for additional cost.

*Types* The AARA system laid out here supports the types given below. The symbol F gives the types of functions, where q and q are non-negative rationals. The symbol S gives the remaining non-function types, where *basic* stands for the basic types like *int* or *unit*, and the resource annotation P is an indexed family of rationals representing the coefficients in a linear combination of basic potential functions.

$$F ::= \ S \stackrel{q/q'}{\to} S \qquad\qquad \qquad S ::= \ basic \mid L^P(S) \mid S \times S$$

The typing rules for these types are given in Figure 2 and explained in the following sections. The values of these types are the usual values.

*Potential* To understand typing rules, it is necessary to define potential. The following potential constructs are generalized from polynomial AARA work [27].

As mentioned, <sup>P</sup> = (pi)i∈<sup>I</sup> is in <sup>Q</sup><sup>I</sup> as an indexed family of rationals. Each entry represents a coefficient in a linear combination of basic potential functions. This linearity makes it natural to overload the type of P as a vector or matrix of rationals, so it is treated as such whenever the context is appropriate. Finally, let those basic potential functions be fixed as some family (fi)i∈<sup>I</sup> , where fi(0) = 0.

We define the potential represented with P using the function φ where

$$\phi(n, P) = \sum\_{i} p\_i \cdot f\_i(n) \dots$$

The function φ yields the total potential on a list (excluding the potential of its elements) as a function of the list's size n and its potential annotation P.

We can then relate resource potential between different sizes of list with the shift operator - : <sup>Q</sup><sup>I</sup> <sup>→</sup> <sup>Q</sup><sup>I</sup> and constant difference operator <sup>δ</sup> : <sup>Q</sup><sup>I</sup> <sup>→</sup> <sup>Q</sup>. These functions need only satisfy the following property equation.

$$
\phi(n+1, P) = \delta(P) + \phi(n, \lhd P) \tag{1}
$$

Though we leave open the explicit definition of these functions for generality, we only later work with instances of them that are linear operators, such that Equation 1 denotes a linear recurrence. Such a refinement leaves -P and δ(P) linear functions of P.

These functions come in handy for understanding the stored potential in a value of a given type, defined by the potential function Φ as follows.

$$\begin{aligned} \Phi(v: basis) &= 0\\ \Phi((v\_1, v\_2): A\_1 \times A\_2) &= \Phi(v\_1: A\_1) + \Phi(v\_2: A\_2) \\ \Phi([]: L^P(A)) &= 0 \\ \Phi(h :: t: L^P(A)) &= \delta(P) + \Phi(h: A) + \Phi(t: L^{\triangleleft P}(A)) \end{aligned}$$

We often need to measure the potential across an entire evaluation context of typed values V : Γ given by a typing context Γ and variable bindings V . We do so by extending the definition of potential Φ as follows.

$$\Phi(\emptyset) = 0 \qquad \qquad \Phi(V : (I, v : A)) = \Phi(V : I \,) + \Phi(v : A).$$

Finally, we can use these definitions to obtain a closed-form expression for the potential over an entire list (including its elements) with the following:

**Lemma 1.** *Let* l = [an, ..., a1] *be a list of* n *values. Then* Φ(l : L<sup>P</sup> (A)) = φ(n, P) + n <sup>i</sup>=1 Φ(a<sup>i</sup> : A)

*Proof.* We induct over the structure of the list l.

For the empty list of length 0:

$$\begin{aligned} \Phi([]:L^{P}(A)) &= 0 = \sum\_{i} p\_{i} \cdot f\_{i}(0) = \phi(0,P) + \sum\_{i=1}^{0} \Phi(a\_{i}:A) \\ \text{For } l = h :: t \text{ of size } n+1; \\ \Phi(a\_{n+1}:b : L^{P}(A)) &= \delta(P) + \Phi(a\_{n+1}:A) + \Phi(l' : L^{\triangleleft P}(A)) \\ &= \delta(P) + \Phi(a\_{n+1}:A) + \phi(n, \lhd P) + \sum\_{i=1}^{n} \Phi(a\_{i}:A) \\ &= \phi(n+1,P) + \sum\_{i=1}^{n+1} \Phi(a\_{i}:A) \end{aligned}$$

We can apply Lemma 1 to the previously defined function *s*noc to see the change in potential between input and output. This difference in potential should bound the resources consumed. For this case, the basic potential functions (fi) only need contain λn.n, and we can let -(p) = p = δ((p)). Letting y be the result of snoc x xs, the type <sup>Z</sup> <sup>×</sup> <sup>L</sup>1(Z) <sup>1</sup>/<sup>0</sup> <sup>→</sup> <sup>L</sup>0(Z) indicates the following bound

$$\Phi(x:\mathbb{Z}, xs:L^1(\mathbb{Z})) + 1 - \Phi(y:L^0(\mathbb{Z})) = \phi(|xs|,1) + 1 - \phi(|y|,0) = |xs| + 1$$

This is exactly the amount of resources consumed, so the bound is tight.

In this work we only consider so-called *univariate* potential, wherein every term in the potential sum is dependent on the length of only one input list. However, different univariate potential summands may depend on different inputs, and thus univariate potential may still be multivariate. The term *multivariate potential* refers to using more general multivariate functions for potential. There is existent work on multivariate potential using polynomial functions [24]. We expect that the work here extends to multivariate potential similarly.

*Typing Rules* The typing rules in Figure 2 use the judgment <sup>Σ</sup>; <sup>Γ</sup> <sup>q</sup> q e : A. In this typing judgment, Γ maps variables to types, while Σ maps function labels to sets of types. This judgment holds when, in the typing environment given by Σ and Γ, the expression e is of type A, subject to the constraints that q and q are the amount of available resources before and after some evaluation of e. Unlike the judgment V e ⇓ v | (q, q- ), these values need not be tight.

By expressing available resources on the turnstile, and potential resources in the types given by Σ,Γ, and A, the type system is set up to formalize the reasoning of the potential method. Theorem 1 shows that it is sound with respect to the operational semantics of Section 2.

Many typing rules preserve the total resource potential they are given, consuming none of it themselves. They therefore usually either have no explicit interaction with potential (e.g. *Lit*) or pass around exactly what they are given (e.g. *Let*). All basic rules in the first block of Figure 2 fit this characterization.

The typing rules concerning functions in second block of Figure 2 are the only to make use of Σ. For each function f defined in *p*rog via *f* unc{f}(x.e), Σ(f) refers to the set of types that its body e could be given. That we allow for sets of types is important because recursive calls to a function may not always make use of a type with the same resource annotations; this is called *resourcepolymorphic recursion*. Despite these rules capturing the intuition behind typing resource-polymorphic recursion, they are not used in existing implementation, as they lead to infinite type derivations. Nonetheless there exists an effective way to type resource-polymorphic recursion with a finite derivation; see [26]. In the examples provided in this article, it usually suffices to consider only *resourcemonomorphic recursion*, wherein inner and outer calls use the same annotation.

All of the rules discussed so far are simply those of existing AARA literature with their parameter for operation cost set to 0 (see e.g. [27]). This does not change their generality, as such constant cost can (and could already in prior work) be simulated using *tick*. Similarly, non-constant costs could be simulated by running helper functions using *tick* the appropriate number of times.

#### **Fig. 2.** AARA typing rules.

**Basic rules:**

<sup>Σ</sup>; <sup>∅</sup> <sup>0</sup> 0 lit : basic Lit Σ; Γ<sup>1</sup> q <sup>p</sup> <sup>e</sup><sup>1</sup> : A Σ; <sup>Γ</sup>2, x : <sup>A</sup> <sup>p</sup> q e<sup>2</sup> : B Σ; Γ1, Γ<sup>2</sup> q q let(e1; x.e2) : B Let Σ; x : basic 0 0 unop(x) : basic Unop Σ; x<sup>i</sup> : basic 0 0 binop(x1, x2) : basic Binop Σ; x : A 0 0 x : A Var Σ; x<sup>1</sup> : A1, x<sup>2</sup> : A<sup>2</sup> 0 <sup>0</sup> pair(x1, x2) : A<sup>1</sup> × A<sup>2</sup> Pair Σ; Γ, x<sup>1</sup> : A1, x<sup>2</sup> : A<sup>2</sup> q q e : B Σ; Γ, x : A<sup>1</sup> × A<sup>2</sup> q q pairMatch(x; x1, x2.e) : B PMat <sup>Σ</sup>; Γ, x : bool <sup>q</sup> q <sup>e</sup><sup>1</sup> : A Σ; Γ, x : bool <sup>q</sup> q e<sup>2</sup> : A <sup>Σ</sup>; Γ, x : bool <sup>q</sup> q cond(x; e1; e2) : A Cond

#### **Function rules:**

<sup>A</sup> q/q- → B ∈ Σ(f) <sup>Σ</sup>; <sup>x</sup> : <sup>A</sup> <sup>q</sup> q app{f}(x) : B App func{f}(x.e) <sup>∈</sup> prog <sup>Σ</sup>; <sup>x</sup> : <sup>A</sup> <sup>q</sup> q e : B <sup>A</sup> q/q- → B ∈ Σ(f) Fun

#### **Potential-focused rules:**

<sup>Σ</sup>; <sup>Γ</sup> max(r, 0) max(−r, 0) tick{r} : unit Tick <sup>Σ</sup>; <sup>Γ</sup> <sup>p</sup> p e : A q ≥ p q − p ≥ q − p <sup>Σ</sup>; <sup>Γ</sup> <sup>q</sup> q e : A Relax <sup>Σ</sup>; Γ, x : <sup>A</sup> <sup>q</sup> q e : B A <: A <sup>Σ</sup>; Γ, x : <sup>A</sup> <sup>q</sup> q e : B SubWeakL <sup>Σ</sup>; <sup>Γ</sup> <sup>q</sup> q e : A A <: A <sup>Σ</sup>; <sup>Γ</sup> <sup>q</sup> q e : A SubWeakR Σ; Γ, x<sup>2</sup> : A2, x<sup>3</sup> : A<sup>3</sup> q qe : B A<sup>1</sup> (A2, A3)

$$\frac{\Sigma; I, x\_2: A\_2, x\_3: A\_3 \ \overline{q'} \ \ e: B \quad A\_1 \lor (A\_2, A\_3)}{\Sigma; I, x\_1: A\_1 \ \overline{q'} \ \ share (x\_1; x\_2, x\_3.e): B} \ \text{Sharing}$$

#### **List rules:**

$$\begin{array}{lcl}\hline\hline\Sigma;\emptyset \xleftarrow{\emptyset} \frac{\mathcal{O}}{\mathcal{O}} \ nil:L^{P}(A) & \begin{array}{c} \Sigma;x\_{h}:A,x\_{t}:L^{\operatorname{cpl}P}(A) \ \frac{\delta(P)}{\mathcal{O}} \end{array} \operatorname{cons}(x\_{h};x\_{t}):L^{P}(A) \\\\ \Sigma;\Gamma \xleftarrow{\mathcal{Q}} \frac{q}{q'} \ e\_{1}:B & \Sigma;\Gamma,x\_{h}:A,x\_{t}:L^{\operatorname{cpl}P}(A) \ \frac{\left(q+\delta(P)\right)}{q'} \ e\_{2}:B \\\\ \hline\Sigma;\Gamma,x:L^{P}(A) & \begin{array}{c} \frac{q}{q'} \ \operatorname{listMatch}(x;e\_{1};x\_{h},x\_{t}.e\_{2}):B \\\\ \end{array} & \begin{array}{c} \end{array} & \begin{array}{c} \end{array} \\\\ \hline\hline\end{array}$$

#### **Fig. 3.** AARA subtyping and sharing judgments.


The remaining rules cover sharing, subtype-weakening, and the rules concerning lists. Weakening, though not listed, is also allowed.

Sharing is a form of contraction. By sharing, the rest of the typing rules can become affine, allowing only single usages of a given variable. Intuitively, sharing is meant to prevent duplicating potential across multiple usages of a variable, and instead split the potential across them. The rules for the sharing judgment, indicating how to split potential, can be found in Figure 3. Note that the rule *ShareList* adds indexed collections of rationals; this should be interpreted pointwise, as if the addends were vectors or matrices.

Subtype-weakening is a form of subtyping based on potential. It discards potential on a list, weakening the upper bound on resources it represents. This rule follows all usual subtyping rules, as well as *Subtype* from Figure 3. Relaxing behaves similarly, but loosens the bounds on the available resources instead.

The intuition for the rules concerning lists in the last block of Figure 2 is that total resources should be conserved between constructions and destructions. Because δ(P) expresses the difference in potential, it is exactly how many resource units are released after a pattern match on a list of type L<sup>P</sup> (A). For the same reason, it is also how many need to be stored when reversing the process and putting an element on a list of type L-<sup>P</sup> (A). Finally, when a list is empty, it has no room to store potential. Every potential function f<sup>i</sup> maps 0 to 0, so an empty list can safely be assigned any scalar of zero potential.

*Soundness* The soundness of the type system is expressed with the following theorem. It states that the evaluation of an expression e does not require more resources than initially present, and (should evaluation terminate) it leaves at least as many resource as dictated. The proof is a straightforward generalization of the version from [27], but we nonetheless reproduce the proof below.

**Theorem 1.** *Let* <sup>Σ</sup>; <sup>Γ</sup> <sup>q</sup> qe : B *and* V *provide the variable bindings for* Γ

*1. If* V e ⇓ v | (p, p- ) *then* p ≤ Φ(V : Γ) + q *and* p − p- ≤ Φ(V : Γ) + q − Φ(v : B) − q-

*2. If* V e ⇓◦| p *then* p ≤ Φ(V : Γ) + q

*Proof.* Assume V binds Γ's variables and perform nested induction on the type derivation and operational judgment for an expression in let-normal form. We show the induction below only for the terminating operational judgment cases, but the partial-evaluation cases are nearly identical.

(**Base Non-Cons**) Suppose the last rule applied in the typing derivation is any non-*Cons* base case, i.e., *Lit*, *Var* , *Unop*, *Binop*, *Pair* , *Nil*, or *Tick*. Then assume the appropriate terminating operational judgment rule applies. In such a case, one finds p ≤ q, p- ≥ q- , and Φ(v : B) = Φ(V : Γ). This and the non-negativity of potential are sufficient to satisfy the desired inequalities.

(**Base Cons**) Suppose the last rule is *Cons*, so q = δ(P) and q- = 0. Assume the *Cons* operational judgment applies, so that p = p- = 0. Note Φ(v<sup>h</sup> :: v<sup>t</sup> : L<sup>P</sup> ) is equal to δ(P) + Φ(v<sup>h</sup> : A) + Φ(v<sup>t</sup> : L<sup>P</sup> (A)) by definition. This identity and the non-negativity of potential satisfy the desired inequalities.

(**Step Implicit Inequalities**) Suppose the last rule is one of *SubWeakL*, *SubWeakR*, *Relax* , or substructural weakening, and assume some operational judgment applies. Each typing requires a similar typing judgment as a premiss. Further, none changes any values, so the same operational judgment still applies. Thus, the inductive hypothesis applies, and gives almost the inequalities we need. Each case provides the inequalities needed to finish. For subtype-weakening, it is sufficient note that C <: D entails Φ(v : C) ≥ Φ(v : D), since C is pointwise greater-then-or-equal to D. For *relax* , the premisses of the *relax* rule directly include the inequalities needed to complete the case. And we can complete the substructural weakening case by noting that the non-negativity of potential entails Φ(V : Γ, v : A) ≥ Φ(V : Γ).

(**Step Let**) Suppose the last rule is *Let*, and suppose its operational judgment applies. The premisses of the typing rule require that Σ; Γ<sup>1</sup> q <sup>r</sup> e<sup>1</sup> : A and <sup>Σ</sup>; <sup>Γ</sup>2, x : <sup>A</sup> <sup>r</sup> q e<sup>2</sup> : B. The premisses of the operational judgment require that V e<sup>1</sup> ⇓ v<sup>1</sup> | (s, s- ) and V [x → v1] e<sup>2</sup> ⇓ v<sup>2</sup> | (t, t- ), where p = s+max(t−s- , 0) and p- = t - + max(s- − t, 0). Applying the inductive hypothesis to these premiss pairs and adding the resulting inequalities cancels terms to complete the case.

(**Step Sharing**) Suppose the last is *Sharing*, so that Γ = Γ- , x<sup>1</sup> : A1. It requires as a premiss that Σ; Γ- , x<sup>2</sup> : A2, x<sup>3</sup> : A<sup>3</sup> q q e : B, where A1(A2, A3). Assuming the operational judgment *Share* applies, V [x<sup>2</sup> → V (x1), x<sup>3</sup> → V (x1)] e ⇓ v | (p, p- ) also holds. The inductive hypothesis applies, yielding the needed inequalities, but for x2, x<sup>3</sup> instead of x1. However, the sharing relation ensures that Φ(v<sup>1</sup> : A1) = Φ(v<sup>2</sup> : A2, v<sup>3</sup> : A3), and this identity finishes the case.

(**Step ListMatch**) Suppose the last is *ListMatch*, so Γ = Γ- , x : L<sup>P</sup> (A). There are two operational judgments which could apply: *LMat0* and *LMat1* .

Suppose the former judgment applies. It requires that V e<sup>1</sup> ⇓ v | (p, p- ). At the same time, the *ListMatch* rule requires as a premiss that Σ; Γ <sup>q</sup> q e<sup>1</sup> : B. The inductive hypothesis applies, yielding the needed inequalities, but for Γ- instead of Γ. However, because Φ(nil : L<sup>P</sup> (A)) = 0, we see Φ(V : Γ- ) = Φ(V : Γ), and the desired inequalities result.

Suppose instead the latter judgment applies. This judgment requires as a premiss that V [x<sup>h</sup> → vh, x<sup>t</sup> → vt] e<sup>2</sup> ⇓ v | (p, p- ). At the same time, the *ListMatch* rule requires that Σ; Γ- , x<sup>h</sup> : A, x<sup>t</sup> : L-<sup>P</sup> (A) <sup>q</sup> <sup>+</sup> <sup>δ</sup>(<sup>P</sup> ) q e<sup>2</sup> : B. The inductive hypothesis applies, telling us that p − p- ≤ Φ(V : Γ- , v<sup>h</sup> : A, v<sup>t</sup> : L-<sup>P</sup> (A)) + <sup>q</sup> <sup>+</sup> <sup>δ</sup>(P) <sup>−</sup> <sup>Φ</sup>(<sup>v</sup> : <sup>B</sup>) <sup>−</sup> <sup>q</sup> and p ≤ Φ(V : Γ- , v<sup>h</sup> : A, v<sup>t</sup> : L-<sup>P</sup> (A)) + q + δ(P). By definition, Φ(v<sup>h</sup> :: v<sup>t</sup> : L<sup>P</sup> ) = δ(P) + Φ(v<sup>h</sup> : A) + Φ(v<sup>t</sup> : L<sup>P</sup> (A)), and applying this identity to the inequalities yields the inequalities needed.

(**Step Cond**) Suppose the last rule is *Cond*, and that either of the *CondT* or *CondF* operational judgments apply. In either case, applying the inductive hypothesis to its premiss and the premiss of *Cond* gives the needed inequalities.

(**Step PMat**) Suppose that the last rule applied is *PMat*, so that Γ = Γ- , x : A1×A2. This rule would require as a premiss that Σ; Γ- , x<sup>1</sup> : A1, x<sup>2</sup> : A<sup>2</sup> q q e- : B, for e the body of the match statement e. Suppose the *PMat* operational judgment applies. This judgment requires as a premiss that V [x<sup>1</sup> → v1, x<sup>2</sup> → v2] e- ⇓ v | (p, p- ), where the value of x is (v1, v2). Applying the inductive hypothesis to these premisses followed by the definitional identity Φ((v1, v2) : A<sup>1</sup> × A2) = Φ(v<sup>1</sup> : A1) + Φ(v<sup>2</sup> : A2) completes the case.

(**Step App**) Suppose the last rule is *App*. Note that this rule requires *Fun* as a premiss, which in turn requires <sup>Σ</sup>; <sup>x</sup> : <sup>A</sup> <sup>q</sup> q e- : B where e is the body of the function being applied. If the *App* operational judgment applies, its premiss would require V [x- → V (x)] e ⇓ v | (p, p- ). Although e might not be a smaller expression than e, the operational judgment derivation still shrinks. This means the inductive hypothesis applies, and it gives the exact inequalities needed.

*Type Inference* Type inference for the Hindley-Milner part of the type system is decidable [21, 41]. The only new barrier for automating inference in AARA is obtaining witnesses for all the coefficients in each annotation P in a derivation.

Each typing rule naturally gives a set of linear constraints on the entries of P. If the relation given by and δ can likewise be expressed with linear constraints, then all such constraints are linear. So long as |P| is finite, this forms a linear program. A linear program solver can then find minimal witnesses efficiently.

Existing AARA literature (see e.g. [27]), however, uses binomial coefficients as the basis functions for P, of which there are infinitely many. This nonetheless works because only a particular finite prefix of their set, <sup>−</sup> 1 ,..., <sup>−</sup> k , are used as a basis in a given analysis. Each such prefix basis also yields the same locallydefinable shift operation: the linear equality p<sup>i</sup> = p<sup>i</sup> + p<sup>i</sup>+1, where p<sup>k</sup> is the coefficient of <sup>−</sup> k and is 0 if the function is outside the prefix. As this is a linear relation, and each prefix is finite, inference can be performed via linear program. The prefix bases of binomial coefficients thereby form an infinite family of finite bases, each of which allows automated inference of resource polynomials up to a fixed degree in the AARA system.

As a caveat, not all programs use resources in a manner compatible with the AARA system. Indeed, it is undecidable whether or not a program uses e.g. polynomial amounts of resources, as this could solve the halting problem.

## **4 Exponential Potential**

Stirling numbers of the second kind <sup>n</sup> k = <sup>1</sup> k! k <sup>i</sup>=0(−1)<sup>i</sup> k i (<sup>k</sup> <sup>−</sup> <sup>i</sup>)<sup>n</sup> count the number of ways to form a k-partition of a set of n elements. These can be used to express exponential potential functions similarly to how binomial coefficients can express polynomial ones. In particular, we make use of Stirling numbers with arguments n, k offset by 1, <sup>n</sup>+1 <sup>k</sup>+1 , so that φ(n, P) = - <sup>i</sup> p<sup>i</sup> · <sup>n</sup>+1 <sup>i</sup>+1 . While other bases could also express exponential potential, these offset Stirling numbers have a few particularly desirable properties, which are described in this section.

*Simple Shift Operation* Like binomial coefficients, the prefixes of the basis of the offset Stirling numbers of the second kind form an infinite family of finite bases, each of which allows automated inference in the AARA system. However, these potential functions are exponential rather than polynomial.

Stirling numbers of the second kind satisfy the recurrence <sup>n</sup>+1 <sup>k</sup>+1 = (k + 1) <sup>n</sup> <sup>k</sup>+1 <sup>+</sup> <sup>n</sup> k . This recurrence allows the operation to have the same local definition for every annotation entry in every prefix basis: p<sup>i</sup> = (i+ 1)p<sup>i</sup> +pi+1, where p<sup>k</sup> is the coefficient of <sup>n</sup>+1 <sup>k</sup>+1 , and is 0 if the function index is outside the chosen prefix. Given this definition for and letting δ(P) = p0, we find p<sup>0</sup> + - i pi <sup>n</sup>+1 <sup>i</sup>+1 = - <sup>i</sup> p<sup>i</sup> <sup>n</sup>+2 <sup>i</sup>+1 , satisfying Equation 1.

This shift operation yields a linear relation, as the coefficient of a given p<sup>i</sup> is a constant scalar. Thus, exactly like when using binomial coefficients, inference is automatable via linear programming. Certain other exponential bases, like Gaussian binomial coefficients, could be similarly automated.

*Expressivity* Because <sup>n</sup>+1 <sup>k</sup>+1 <sup>=</sup> <sup>1</sup> k! k <sup>i</sup>=0(−1)<sup>k</sup>−<sup>i</sup> k i (<sup>i</sup> + 1)<sup>n</sup> <sup>∈</sup> <sup>Θ</sup>((<sup>k</sup> + 1)<sup>n</sup>), the offset Stirling numbers of the second kind can form a linear basis for the space of sums of exponential functions. Each function λn.b<sup>n</sup> with <sup>b</sup> <sup>≥</sup> 1 can be expressed as a linear combination of the functions λn.<sup>n</sup>+1 <sup>k</sup>+1 .

The function λn.<sup>n</sup>+1 <sup>k</sup>+1 is also non-negative for natural n, and non-decreasing with respect to n. These are two natural properties to require of basic potential functions, since amortized analysis requires non-negative resources, and larger inputs should not usually become cheaper to process. Further, the properties are preserved by non-negative linear (i.e. conical) combination, and by when defined with a non-negative linear recurrence - the combinations given by P and -P always satisfy the two potential function properties.

Ensuring these properties for more general potential functions requires determining if such a function on a natural domain is always non-negative. This is non-trivial. In the existing literature on multivariate polynomials, we find this is *undecidable* in the worst case [40]. However, restricting to non-negative linear (that is, *conical*) combinations of non-negative, non-decreasing functions as we have done here - gives simple linear constraints that ensure both desired properties. For finite bases, this is easily handled via linear programming.

When considering expressivity in this conical combination model of potential functions, one finds some otherwise-valid potential functions are not be expressible in the conical space given by the offset Stirling number functions. Nonetheless, Stirling number functions are a *maximally expressive* basis; it is not possible to express additional potential functions using a different basis without losing expressibility elsewhere. Notably, the standard exponential basis is *not* maximal in this sense. The formal statement of such maximal expressivity is generalized in the theorem below. Any finite, sequential subset of the offset Stirling number functions satisfy the prerequisites of this theorem, as do the binomial coefficient functions and other well-known functions like the Gaussian polynomials.

**Theorem 2.** *Let* {f<sup>i</sup>} *be a finite set of linearly independent functions on the naturals that are non-negative and non-decreasing. Let* fi(n) *be 0 until* n ≥ i*,* *and let* i ≤ j *imply that* O(fi) ⊆ O(f<sup>j</sup> )*, with asymptotic equality only when* i = j*. Let* L *be the linear span (collection of linear combinations) of* {fi}*, and let* C *be its conical span (collection of conical combinations).*

*There does not exist another linearly independent basis* {gi} *with linear span* L *and conical span* D - C *such that each function in* {gi} *is non-negative and non-decreasing. That is,* {fi} *has a maximally expressive conical span.*

*Proof.* Suppose there is such a basis {g<sup>i</sup>}. We express each basis {f<sup>i</sup>} and {g<sup>i</sup>} with linear combinations of the other, and derive a contradiction.

If there is any function in the conical span D of {g<sup>i</sup>} that is not in C, then this is the case for some basis function gk. Because g<sup>k</sup> ∈ L, it can be written as a linear combination of {f<sup>i</sup>}; let - <sup>i</sup> αif<sup>i</sup> = gk. Because g<sup>k</sup> ∈ C, there is at least one coefficient α<sup>i</sup> < 0; let it be αm. In case there are multiple candidate elements gk, pick g<sup>k</sup> to be the basis function such that this index m is minimized.

We then see that gk(m) = - <sup>i</sup> αifi(m)=(- i<m αifi(m)) + αmfm(m) because fi(m) for i>m is 0. This yields two observations: First, m<k, as otherwise the fastest-growing term of g<sup>k</sup> would be negative, but g<sup>k</sup> is never negative. Second, the term - αmfm(m) is negative, yet g<sup>k</sup> ≥ 0, so it must be that i<m αifi(m) > 0. Thus there exists a coefficient α<sup>p</sup> > 0 where p<m.

Now we look at representing {f<sup>i</sup>} with {g<sup>i</sup>}. Because the conical span D contains C, it can represent each f<sup>i</sup> as a conical combination. Notably, a given f<sup>i</sup> cannot be represented only with functions outside of Ω(fi), nor any function outside of O(fi), due to growth rates. There is therefore at least one function in {g<sup>i</sup>} that is Θ(fi), for each i. Since the linear span of these corresponding g<sup>i</sup> already has the same (finite) dimension as L, any additional functions would not be linearly independent. Due to this, we can say g<sup>i</sup> ∈ Θ(fi) uniquely for each i.

Take f<sup>k</sup> in particular as a conical combination of {g<sup>i</sup>}. We now consider replacing each element of {g<sup>i</sup>} in that conical combination with its equivalent linear combination of elements of {f<sup>i</sup>}. Because of the above correspondence of growth rates, there must be a positive coefficient for gk. Because g<sup>k</sup> has positive weight α<sup>p</sup> on f<sup>p</sup> where p<m<k, another basis function g<sup>i</sup> in the conical combination must have negative weight on f<sup>p</sup> to cancel it out in their linear combination. However, g<sup>k</sup> was picked such that it had the lowest index m with negative weight across all {g<sup>i</sup>}; it is contradictory for there to be such a p<m.

*Natural Semantics* The values of <sup>n</sup>+1 <sup>k</sup>+1 count the number of ways to pick k nonempty disjoint subsets of n elements. Many programs with exponential resource use iterate over collections of subsets, so these numbers naturally arise.

Recall the naive solution to subset sum from the introduction. The algorithm iterates through all the subsets of numbers in the input list. When considering Fagin's descriptive complexity result that NP problems are precisely those expressible in existential second order logic [14], it becomes clear that naive solutions to any NP-complete problem fit this characterization: naively brute-forcing through second order terms to find an existential witness is just iterating through tuples of subsets.

*Example* Consider the naive solution to subset sum from the introduction. One can verify that the number of Boolean and arithmetic operations used on an input of size <sup>n</sup> is 3 <sup>∗</sup> <sup>2</sup><sup>n</sup> <sup>−</sup> 2 by induction. We find the same bound here by preceding each such operation with an explicit *tick*{1} operation. Thee AARA

type system then verifies that the type of *subsetSum* is <sup>L</sup>3(Z) <sup>×</sup> <sup>Z</sup> <sup>1</sup>/<sup>0</sup> → *bool*.

Here is the code again, with type annotations on each line tracking the amount of <sup>n</sup>+1 2 potential on lists, and comments tracking available constant potential. For clarity, the code is re-written in a let-normal form, and sharing locations are marked.


The indicated values yield witnesses for the AARA typing rules, so we know via soundness that the difference between initial and ending potential gives an upper bound on how many operations were used. That difference is 1+3∗ <sup>n</sup>+1 2 = <sup>3</sup> <sup>∗</sup> <sup>2</sup><sup>n</sup> <sup>−</sup> 2, where <sup>n</sup> is the size of *nums*, exactly the amount used.

Exponential terms with higher bases than 2 can come into play with more recursive calls, like in the code below enumerating the 3<sup>n</sup> ways to put n labelled balls into 3 labelled bins.


By paying a unit of resource for each such way using *t*ick, we can use AARA to bound the count. It assigns a type of <sup>L</sup><sup>2</sup>,<sup>2</sup>(Z) <sup>1</sup>/<sup>0</sup> <sup>→</sup> <sup>L</sup><sup>0</sup>,<sup>0</sup>(L<sup>0</sup>,<sup>0</sup>(Z) <sup>×</sup> <sup>L</sup><sup>0</sup>,<sup>0</sup>(Z) <sup>×</sup> L<sup>0</sup>,<sup>0</sup>(Z)) to *b*allBins3, where the superscript tracks <sup>n</sup>+1 2 and <sup>n</sup>+1 3 potential, respectively. Since 2<sup>n</sup>+1 3 + 2<sup>n</sup>+1 2 +1=3<sup>n</sup>, this bound is exact.

## **5 Mixed Potential**

It is possible to combine the existing polynomial potential functions with these new exponential potential functions to not only conservatively extend both, but further represent potentials functions with their products. This space represents functions in Θ(nk(b + 1)<sup>n</sup> ) for naturals k, b, and does so with terms of the form <sup>n</sup> k <sup>n</sup>+1 <sup>b</sup>+1 so that φ(n, P) = - b,k pb,k · n k <sup>n</sup>+1 <sup>b</sup>+1 . Note that for k or b equal to 0, the potential functions here reduce to the offset Stirling numbers or binomial coefficients, respectively.

The methods used to combine these potential functions here can easily be generalized to combine any two suitable sets.

*Simple Shift Operation* It is straightforward to find a linear recurrence for these products by distributing over their linear recurrences.

$$\begin{aligned} \binom{n+1}{k+1} \begin{Bmatrix} n+2 \\ b+2 \end{Bmatrix} &= (\binom{n}{k+1} + \binom{n}{k})((b+2)\begin{Bmatrix} n+1 \\ b+2 \end{Bmatrix} + \begin{Bmatrix} n+1 \\ b+1 \end{Bmatrix}) \\ &= (b+2)\binom{n}{k+1}\begin{Bmatrix} n+1 \\ b+2 \end{Bmatrix} + (b+2)\binom{n}{k}\begin{Bmatrix} n+1 \\ b+2 \end{Bmatrix} + \binom{n}{k+1}\begin{Bmatrix} n+1 \\ b+1 \end{Bmatrix} + \binom{n}{k}\begin{Bmatrix} n+1 \\ b+1 \end{Bmatrix} \end{aligned}$$

As before, this yields a definition for δ and with Equation 1. Letting P now be indexed by pairs b, k: pb,k = (b + 1)pb,k + (b + 1)pb,k+1 + p<sup>b</sup>+1,k + p<sup>b</sup>+1,k+1, and δ(P) = p0,<sup>1</sup> +p1,<sup>0</sup> +p1,1. Noting that these definitions are linear again yields automatability for finite (2-dimensional) prefixes of the basis.

*Expressivity* The product of non-negative, non-decreasing functions is still nonnegative and non-decreasing, so products of valid potential functions are still valid. Soundness is preserved by letting p<sup>0</sup> be shorthand for the new constant function coefficient p0,<sup>0</sup> wherever it is used in Theorem 1. Moreover, maximality of expressivity is preserved, simply by giving index pairs the ordering relation (i1, i2) ≤ (j1, j2) ⇐⇒ i<sup>1</sup> ≤ j<sup>1</sup> ∧ i<sup>2</sup> ≤ j<sup>2</sup> and applying Theorem 2.

*Example* Consider bounding the number of Boolean and arithmetic operations in a variation of subset sum: *single-use* subset sum. Here the input may contain duplicate numbers that should be ignored, so as to treat the input as a true set. This is a trivial change to the mathematical problem, but one that real code might have to deal with, depending on the implementation of sets.

The code can be changed to handle this by removing all later duplicates of each number it reaches, so that later recursive calls will never see the number again. It is easy to create a function *remove* of type <sup>Z</sup>×L<sup>a</sup>+1,b,c(Z) d/d <sup>→</sup> <sup>L</sup>a,b,c(Z) to do this for any a, b, c, d, where the superscript values represent linear, <sup>n</sup>+1 2 , and n <sup>n</sup>+1 2 potential, respectively.

One can prove by induction that at most 4 <sup>∗</sup> <sup>2</sup><sup>n</sup> <sup>−</sup>n−3 Boolean or arithmetic operations are required. Although this can be bounded with only exponential functions, the purely exponential potential system cannot reason about the exact (linear) cost associated with *remove*, and overestimates the bound to be in θ(3<sup>n</sup>). This mixed system can provide a better (though still loose) bound of <sup>n</sup>2<sup>n</sup> + 2 <sup>∗</sup> <sup>2</sup><sup>n</sup> <sup>−</sup> <sup>n</sup> <sup>−</sup> 1, giving a type of <sup>L</sup><sup>0</sup>,2,<sup>1</sup>(Z) <sup>×</sup> <sup>Z</sup> <sup>1</sup>/<sup>0</sup> → bool to *s*ubSum1. After showing this derivation, we will show how to find the exact bound with AARA.

The following is the single-use subset sum code, with comments on each line tracking the amount of available resources on each line. For clarity, we indicate sharing and subtype-weakening locations.

```
let subSum1 nums:L0,2,1(Z) target = (* 1 *)
  match nums with
  | [] → (* 1 *)
     tick 1; target = 0 (* 0 *)
  | hd::(tl:L1,6,2(Z)) → (* 4 *)
     let otherNums:L0,6,2(Z) = remove hd tl:L1,6,2(Z) in (* 4 *)
     tick 1; let newTarg = target - hd in (* 3 *)
     (* weaken otherNums:L0,6,2(Z) to L0,4,2(Z) *)
     (* share otherNums:L0,4,2(Z) as L0,2,1(Z), L0,2,1(Z) *)
     let withNum = subSum1 otherNums:L0,2,1(Z) newTarg in (* 2 *)
     let without = subSum1 otherNums:L0,2,1(Z) target in (* 1 *)
     tick 1; withNum || without (* 0 *)
```
The difference between initial and ending potential gives the upper bound of 1+2<sup>n</sup>+1 2 + n ∗ <sup>n</sup>+1 2 <sup>=</sup> <sup>n</sup>2<sup>n</sup> + 2 <sup>∗</sup> <sup>2</sup><sup>n</sup> <sup>−</sup> <sup>n</sup> <sup>−</sup> 1 Boolean or arithmetic operations.

Note that we use the subtype-weakening rule, throwing away 2 units of <sup>n</sup>+1 2 potential. This indicates why the bound is not tight. Next we show how to improve this bound using potential demotion.

*Demotion* There is one special exception to the non-negativity of potential annotations that may be added due to the particular nature of the relation between binomial coefficients and Stirling numbers. It represents the concept of *demoting* exponential potential into polynomial potential.

The relevant relation is <sup>n</sup>+1 2 = 2<sup>n</sup> <sup>−</sup> 1 = -∞ <sup>i</sup>=1 <sup>n</sup> i ≥ k <sup>i</sup>=1 <sup>n</sup> i . This allows a unit of <sup>n</sup>+1 2 potential to account for one unit *each* of all non-constant binomial coefficient potentials. We can express this with the following additional subtyping rule. In this rule we interpret the 2-dimensional indexing of the potential annotation as a matrix, and we let −→p refer to the vector of potential entries at index coordinates 0, i for i ≥ 1.

$$\begin{aligned} P &= R + \begin{bmatrix} 0 & \overrightarrow{p} \\ r & 0 \end{bmatrix} & Q &= R + \begin{bmatrix} 0 & \overrightarrow{p} + s \ast \overrightarrow{1} \\ r - s & 0 \end{bmatrix} \\ &\quad \underline{L^P(A) << L^Q(A)} \end{aligned}$$

**Theorem 3.** *The demotion rule is sound.*

*Proof.* We need only show that C <: D implies Φ(v : D) ≤ Φ(v : C) for unchanged values v. The rest of soundness then follows as in Theorem 1. To do so, it is sufficient to show for <sup>l</sup> = [a1,...,an] we have <sup>Φ</sup>(<sup>a</sup> : <sup>L</sup><sup>Q</sup>(A)) <sup>≤</sup> <sup>Φ</sup>(<sup>a</sup> : <sup>L</sup><sup>P</sup> (A)).

Without loss of generality, we need only consider where R = 0.

$$\begin{aligned} \Phi(l:L^Q(A)) &= \phi(n,Q) + \sum\_{i=1}^n \Phi(a\_i:A) \\ &= (r-s)\{\binom{n+1}{2} + \sum\_{i=1}^k (\overrightarrow{p}\_{i-1} + s)\binom{n}{i} + \sum\_{i=1}^n \Phi(a\_i:A) \end{aligned}$$

$$\begin{aligned} &=\sum\_{i=1}^{\infty} (r-s)(\binom{n}{i} + \sum\_{i=1}^{k} (\overrightarrow{p}\ {\_{i-1}} + s)(\binom{n}{i} + \sum\_{i=1}^{n} \Phi(a\_i : A)) \\ &\leq \sum\_{i=1}^{\infty} r(\binom{n}{i} + \sum\_{i=1}^{k} \overrightarrow{p}\ {\_{i-1}}(\binom{n}{i} + \sum\_{i=1}^{n} \Phi(a\_i : A) \\ &= r\binom{n+1}{2} + \sum\_{i=1}^{k} \overrightarrow{p}\ {\_{i-1}}(\binom{n}{i} + \sum\_{i=1}^{n} \Phi(a\_i : A) \\ &= \phi(n, P) + \sum\_{i=1}^{n} \Phi(a\_i : A) = \Phi(l : L^P(A)) \end{aligned}$$

As a corollary, this allows us to loosen the constraint that every annotation P contains only non-negative rationals. In particular, it is no longer required that <sup>∀</sup>i.p0,i <sup>≥</sup> 0. Instead, we require that <sup>∀</sup>i.p0,i <sup>+</sup>p1,<sup>0</sup> <sup>≥</sup> 0. Each unit of <sup>n</sup>+1 2 potential may "pay" for one unit of deficit from each polynomial potential function. Because this is still a linear constraint, type inference remains automatable.

Using *Demote*, tighter bounds can be obtained. Consider the single-use subset sum solution from the previous section. Here it is again below, but this time allowing the linear potential to be paid for by <sup>n</sup>+1 2 potential. AARA can now provide a type of <sup>L</sup>−1,4,<sup>0</sup>(Z) <sup>×</sup> <sup>Z</sup> <sup>1</sup>/<sup>0</sup> → bool for *s*ubSum1, corresponding to the exact upper bound of 4 <sup>∗</sup> <sup>2</sup><sup>n</sup> <sup>−</sup> <sup>n</sup> <sup>−</sup> 3 operations. This time <sup>n</sup> <sup>∗</sup> <sup>n</sup>+1 2 is elided in the annotated potentials, as it is not needed.


The difference between initial and ending potential gives the upper bound of <sup>1</sup> <sup>−</sup> <sup>n</sup> + 4<sup>n</sup>+1 2 = 4 <sup>∗</sup> <sup>2</sup><sup>n</sup> <sup>−</sup> <sup>n</sup> <sup>−</sup> 3, as desired.

## **6 Exponentials, Polynomials, and Logarithms**

The addition of exponential potential also allows for the inference of previously nonderivable polynomial-resource types for certain programs. One such way this can happen is by compacting the potential of a list into a new list logarithmic in size to the first. Performing exponential-cost operations, such as *subsetSum*, on a list of logarithmic size only has linear cost in total.

In the code below, *log* takes a list x of length n and returns a list of length roughly log2(n). If x begins with one unit of linear potential, the type system assigns the output of *log* one unit of base-2 exponential (2<sup>n</sup> <sup>−</sup> 1) potential. We show in the code below with types of the form La,b, where a is the linear potential, and b is the base-2 exponential potential. This lets us find that *half* can have type <sup>L</sup><sup>1</sup>,<sup>0</sup>(Z) <sup>0</sup>/<sup>0</sup> <sup>→</sup> <sup>L</sup><sup>2</sup>,<sup>0</sup>(Z) and *<sup>l</sup>*og has type <sup>L</sup><sup>1</sup>,<sup>0</sup>(Z) <sup>0</sup>/<sup>0</sup> <sup>→</sup> <sup>L</sup><sup>0</sup>,<sup>1</sup>(Z). The typing of *<sup>l</sup>*og shows the conversion from linear to exponential potential.


Typing *l*og above requires resource-polymorphic recursion. However, this can be justified by noting that the above can be thought of to show *half* has type <sup>L</sup>a,<sup>0</sup>(Z) <sup>0</sup>/<sup>0</sup> <sup>→</sup> <sup>L</sup><sup>2</sup>a,<sup>0</sup>(Z) and *<sup>l</sup>*og has type <sup>L</sup>a,<sup>0</sup>(Z) <sup>0</sup>/<sup>0</sup> <sup>→</sup> <sup>L</sup><sup>0</sup>,a(Z) for any <sup>a</sup> <sup>≥</sup> 0.

Coincidentally, *l*og conversion of linear to exponential potential certifies that the output list's size can be bounded by a logarithm of the input's size. Nonetheless, logarithmic *potential* is not directly compatible with the approach this work takes. Sublinear functions have negative second derivatives, and this yields negative annotation entries under applications. This may not be insurmountable, as the demotion rule showed here, but new ideas are needed overall. Logarithmic potential has been explored in [32], though the approach there departs from the automatable AARA framework of linear constraint solving.

## **7 Conclusion and Future Work**

Using Stirling numbers of the second kind allows for the automated inference of exponential resource usages via Automatic Amortized Resource Analysis. This may be combined with the existing polynomial system, allowing mixtures of polynomial and exponential functions to be inferred. Under this system, more kinds of programs can now be automatically analyzed, in particular those making use of multiple recursive calls, or logarithmically-sized lists. Finally, the framework put in place to accomplish this separates the concerns of the type system and potential functions, paving the way to allow modular addition of different potential functions. Future work could extend the work here to cover additional language features supported in polynomial AARA literature, like trees [22].

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Concurrent Kleene Algebra with Observations: from Hypotheses to Completeness**

Tobias Kapp´e (-), Paul Brunet , Alexandra Silva , Jana Wagemaker , and Fabio Zanasi

University College London, London, United Kingdom; tkappe@cs.ucl.ac.uk

**Abstract.** Concurrent Kleene Algebra (CKA) extends basic Kleene algebra with a parallel composition operator, which enables reasoning about concurrent programs. However, CKA fundamentally misses *tests*, which are needed to model standard programming constructs such as conditionals and while-loops. It turns out that integrating tests in CKA is subtle, due to their interaction with parallelism. In this paper we provide a solution in the form of Concurrent Kleene Algebra with Observations (CKAO). Our main contribution is a completeness theorem for CKAO. Our result resorts on a more general study of CKA "with hypotheses", of which CKAO turns out to be an instance: this analysis is of independent interest, as it can be applied to extensions of CKA other than CKAO.

*Acknowledgments.* This work was partially supported by the ERC Starting Grant ProFoundNet, grant code 679127. We acknowledge support from the EPSRC grants EP/S028641/1 (A. Silva); EP/R020604/1 (F. Zanasi); EP/R006865/1 (P. Brunet).

## **1 Introduction**

*Kleene algebra with tests* (KAT) is a (co)algebraic framework [17,19] that allows one to study properties of imperative programs with conditional branching, i.e. if-statements and while-loops. KAT is build on Kleene algebra (KA) [6,16], the algebra of regular languages. Both KA and KAT enjoy a rich meta-theory, which makes them a suitable foundation for reasoning about program verification. In particular, it is well-known that the equational theories of KA and KAT characterise rational languages [27,21,16] and guarded rational languages [17] respectively. Efficient procedures for deciding equivalence have been studied in recent years, also in view of recent applications to network verification [3,8,28].

Concurrency is a known source of bugs and hence challenges for verification. Hoare, Struth, and collaborators [11], have proposed an extension of KA, *Concurrent Kleene Algebra* (CKA), as an algebraic foundation for concurrent programming. CKA enriches the basic language of KA with a parallel composition operator ·-·. Analogously to KA, CKA also has a semantic characterisation for which the equational theory is complete, in terms of rational languages of *pomsets* (words with a partial order on letters) [23,24,15].

c The Author(s) 2020 J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 381–400, 2020. https://doi.org/10.1007/978-3-030-45231-5\_20

The development of CKA raises a natural question, namely how tests, which were essential in KAT for the study of sequential programs, can be integrated into CKA. At first glance, the obvious answer may appear to be to merge KAT with CKA, yielding Concurrent Kleene Algebra with Tests (CKAT) — as attempted in [12]. However, as it turns out, integrating tests into CKA is quite subtle and this naive combination does not adequately capture the behaviour of concurrent programs. In particular, using the CKAT framework of [12] one can prove that for any test b and CKAT program e:

$$\begin{array}{ccccccccc} 0 & \leq\_{\mathsf{K47}} & b \cdot e \cdot \overline{b} & \leq\_{\mathsf{CK4}} & e \parallel (b \cdot \overline{b}) & \equiv\_{\mathsf{K47}} & e \parallel 0 & \equiv\_{\mathsf{CK4}} & 0 \end{array}$$

thus <sup>b</sup> · <sup>e</sup> · <sup>b</sup> <sup>≡</sup>CKAT 0, meaning no program <sup>e</sup> can change the outcome of any test <sup>b</sup>. Or equivalently, and undesirably, that any test is an invariant of any program!

The core issue is the identification in KAT of sequential composition · and Boolean conjunction ∧. In the concurrent setting this is not sound as the values of variables — and hence tests — can be changed between the two tests.

In order to fix this issue, we have presented *Kleene Algebra with Observations* (KAO) in previous work [13]. Algebraically, KAO differs from KAT in that conjunction of tests <sup>b</sup> <sup>∧</sup> <sup>b</sup> and their sequential composition b · b are distinct operations. In particular, b∧b expresses a single test executed *atomically*, whereas b·b describes two distinct executions, occurring one after the other. As mentioned above, this distinction is crucial when moving from the sequential setting of KA to the concurrent setting of CKA, as actions from another thread that happen to be scheduled after b but before b may as well change the outcome of b- .

This newly developed extension of KA enables a novel attempt to enrich CKA with the ability to reason about programs that also have the traditional conditionals: in this paper, we present Concurrent Kleene Algebra with Observations (CKAO) and show that it overcomes the problems present in CKAT.

The traditional plan for developing a variant of (C)KA is to define a separate syntax, semantics, and set of axioms, before establishing a formal correspondence with the base syntax, semantics and axioms of (C)KA proper, and arguing that this correspondence allows one to conclude soundness and completeness of the axioms w.r.t. the semantics, as well as decidability of equivalence in the semantics. Instead of such a tailor-made proof, however, we take a more general approach by first proposing CKA with hypotheses (CKAH) as a formalism for studying extensions of CKA, akin to how Kleene algebra with hypotheses [5,18,20,7] can be used to extend Kleene algebra. We then apply CKAH to study CKAO, but the meta-theory developed can also be applied to extensions other than CKAO.

Using the CKAH formalism, we instantiate CKAO as CKAH with a particular set of hypotheses, and we immediately obtain a syntax and semantics; we can then use the meta-theory of CKAH to argue completeness and decidability in a modular proof, which composes results about CKA [15] and KAO [13].

The technical roadmap of the paper and its contributions are as follows.

**–** We introduce Concurrent Kleene Algebra with Hypotheses (CKAH), a formalism for studying extensions of CKA; this is a concurrent extension of Kleene Algebra with Hypotheses (Section 4). We show how CKAH is sound

with respect to rational pomset languages closed under an operation arising from the set of hypotheses. We propose techniques to argue completeness of the extended set of axioms with respect to the sound model as well as decidability of equivalence, capturing methods commonly used in literature to argue completeness and decidability for extensions of (concurrent) KA.

**–** We prove that CKAO can be presented as an instance of CKAH, for a certain set of hypotheses (Section 5). This gives us a sound model of CKAO 'for free'. We then prove that the axioms of CKAO are also complete for this model, and that equivalence is decidable, using the techniques developed previously.

We conclude this introduction by giving an example of how hypotheses can be added to CKA to include the meaning of primitive actions. Suppose we were designing a DSL for recipes, specifically, the steps necessary, and their order. A recipe to prepare cookies might contain the actions mix (mixing the ingredients), preheat (pre-heating the oven), chill (chilling the dough) and bake (baking the cookies). Using these actions, a recipe like "mix the ingredients until combined; chill the dough while pre-heating the oven; bake cookies in the oven" may be encoded as mix<sup>∗</sup> ·(chill preheat)· bake. Now, imagine that we have only one oven, meaning that we cannot bake two batches of cookies concurrently. We might encode this restriction on concurrent behaviour by forcing the equation

$$(e \cdot \mathsf{bake} \cdot f) \parallel (g \cdot \mathsf{bake} \cdot h) = (e \cdot \mathsf{bake} \parallel g) \cdot (f \parallel \mathsf{bake} \cdot h) + (e \parallel g \cdot \mathsf{bake}) \cdot (\mathsf{bake} \cdot f \parallel h)$$

As a consequence of this hypothesis, one could then derive properties such as

$$\mathbf{\color{red}{bake\ }||} \ (\mathbf{\color{red}{bake\ }\cdot\color{red}{mix})} = \mathbf{\color{blue}{bake\ }\cdot\color{blue}{bake\ }\cdot\color{blue}{mix} + \color{blue}{bake\ }\cdot\color{blue}{mix}\cdot\color{blue}{bake\ }}$$

In a nutshell, this paper provides an algebraic framework — CKAH — together with techniques for soundness and completeness results. The framework is flexible in that different instantiations of the hypotheses generate very different algebraic systems. We provide one instantiation — CKAO — that enables analysis of programs with both concurrency primitives and Boolean assertions. This is the first sound and complete algebraic theory to reason about such programs.

For the sake of brevity, some proofs appear in the extended version [14].

## **2 Preliminaries**

We recall basic definitions on pomset languages, used in the semantics of CKA, which generalise languages to allow letters in words to be partially ordered. We fix a (possibly infinite) alphabet Σ. When defining sets parametrised by Σ, say <sup>S</sup>(Σ), if Σ is clear from the context we use <sup>S</sup> to refer to <sup>S</sup>(Σ).

**Posets and Pomsets** Pomsets [9,10] are labelled posets, up to isomorphism.

**Definition 2.1 (Labellet poset).** *<sup>A</sup>* labelled poset *over* <sup>Σ</sup> *is a tuple* **<sup>u</sup>** <sup>=</sup> S, <sup>≤</sup>, λ*, where* <sup>S</sup> *is a finite set (the* carrier *of* **<sup>u</sup>***),* <sup>≤</sup>**<sup>u</sup>** *is a partial order on* <sup>S</sup> *(the* order *of* **<sup>u</sup>***), and* λ : S <sup>→</sup> Σ *is a function (the* labelling *of* **<sup>u</sup>***).*

We will denote labelled posets by bold lower-case letters **u**, **v**, etc. We write <sup>S</sup>**<sup>u</sup>** for the carrier of **<sup>u</sup>**, <sup>≤</sup>**<sup>u</sup>** for the order of **<sup>u</sup>**, and <sup>λ</sup>**<sup>u</sup>** for the labelling of **<sup>u</sup>**. We assume that any labelled poset has a carrier that is a subset of some countably infinite set, say <sup>N</sup>; this allows us to speak about the *set of labelled posets* over Σ. The precise contents of the carrier, however, are not important — what matters to us is the labels of the points, and the ordering between them.

**Definition 2.2 (Poset isomorphism, pomset).** *Let* **<sup>u</sup>**, **<sup>v</sup>** *be labelled posets over* <sup>Σ</sup>*. We say* **<sup>u</sup>** *is* isomorphic *to* **<sup>v</sup>***, denoted* **<sup>u</sup>** <sup>∼</sup><sup>=</sup> **<sup>v</sup>***, if there exists a bijection* <sup>h</sup> : <sup>S</sup>**<sup>u</sup>** <sup>→</sup> <sup>S</sup>**<sup>v</sup>** *that preserves labels, and preserves and reflects ordering. More precisely, we require that* <sup>λ</sup>**<sup>v</sup>** ◦ <sup>h</sup> <sup>=</sup> <sup>λ</sup>**<sup>u</sup>***, and* <sup>s</sup> <sup>≤</sup>**<sup>u</sup>** <sup>s</sup> *if and only if* <sup>h</sup>(s) <sup>≤</sup>**<sup>v</sup>** <sup>h</sup>(s- )*.*

*<sup>A</sup>* pomset *over* Σ *is an isomorphism class of labelled posets over* Σ*, i.e., the class* [**v**] = {**u** : **u** ∼= **v**} *for some labelled poset* **v***.*

We write Pom(Σ) for the set of pomsets over Σ, and 1 for the empty pomset. As long as we have countably many pomsets in scope, the above allows us to assume w.l.o.g. that those pomsets are represented by labelled posets with pairwise disjoint carriers; we tacitly make this assumption throughout this paper.

Pomsets can be concatenated, creating a new pomset that contains all events of the operands, with the same label, but which orders all events of the left operand before those of the right one. We can also compose pomsets in parallel, where events of the operands are juxtaposed without any ordering between them.

**Definition 2.3 (Pomset composition).** *Let* U = [**u**] *and* V = [**v**] *be pomsets over* Σ*. We write* U - V *for the* parallel composition *of* U *and* V *, which is the pomset over* Σ *represented by the labelled poset* **<sup>u</sup> v***, where*

$$S\_{\mathbf{u}\parallel\mathbf{v}} = S\_{\mathbf{u}} \cup S\_{\mathbf{v}} \qquad \leq\_{\mathbf{u}\parallel\mathbf{v}} = \leq\_{\mathbf{u}} \cup \leq\_{\mathbf{v}} \qquad \lambda\_{\mathbf{u}\parallel\mathbf{v}}(x) = \begin{cases} \lambda\_{\mathbf{u}}(x) & x \in S\_{\mathbf{u}},\\ \lambda\_{\mathbf{v}}(x) & x \in S\_{\mathbf{v}}. \end{cases}$$

*Similarly, we write* U · V *for the* sequential composition *of* U *and* V *, that is, the pomset represented by the labelled poset* **u** · **v***, where*

$$S\_{\mathbf{u}\cdot\mathbf{v}} = S\_{\mathbf{u}\parallel\mathbf{v}} \qquad \qquad \leq\_{\mathbf{u}\cdot\mathbf{v}} = \leq\_{\mathbf{u}} \cup \leq\_{\mathbf{v}} \cup (S\_{\mathbf{u}} \times S\_{\mathbf{v}}) \qquad \qquad \lambda\_{\mathbf{u}\cdot\mathbf{v}} = \lambda\_{\mathbf{u}\parallel\mathbf{v}}.$$

Just like words are built up from the empty word and letters using concatenation, we can build a particular set of pomsets using only sequential and parallel composition; this will be the primary type of pomset that we will use.

**Definition 2.4 (Series-parallel).** *The set of* series-parallel pomsets *(* sppomsets*) over* Σ*, denoted* SP(Σ)*, is the smallest set s.t.* <sup>1</sup> <sup>∈</sup> SP(Σ)*,* <sup>a</sup> <sup>∈</sup> SP(Σ) *for every* <sup>a</sup> <sup>∈</sup> Σ*, and it is closed under parallel and sequential composition.*

The following characterisation of SP is very useful in proofs.

**Theorem 2.5 (Gischer [9]).** *Let* <sup>U</sup> = [**u**] <sup>∈</sup> Pom*. Then* <sup>U</sup> <sup>∈</sup> SP *if and only if* <sup>U</sup> *is* <sup>N</sup>-free*, which is to say that if there exist no distinct* <sup>s</sup><sup>0</sup>, s<sup>1</sup>, s<sup>2</sup>, s<sup>3</sup> <sup>∈</sup> <sup>S</sup>**<sup>u</sup>** *such that* <sup>s</sup><sup>0</sup> <sup>≤</sup>**<sup>u</sup>** <sup>s</sup><sup>1</sup> *and* <sup>s</sup><sup>2</sup> <sup>≤</sup>**<sup>u</sup>** <sup>s</sup><sup>3</sup> *and* <sup>s</sup><sup>0</sup> <sup>≤</sup>**<sup>u</sup>** <sup>s</sup><sup>3</sup>*, with no other relation between them.*

One way of comparing pomsets is to see whether they have the same events and labels, except that one is "more sequential" in the sense that more events are ordered. This is captured by the notion of *subsumption* [9], defined as follows.

**Definition 2.6 (Subsumption).** *Let* U = [**u**] *and* V = [**v**]*. We say* U is subsumed by <sup>V</sup> *, written* <sup>U</sup> <sup>V</sup> *, if there exists a label- and order-preserving bijection* <sup>h</sup> : <sup>S</sup>**<sup>v</sup>** <sup>→</sup> <sup>S</sup>**u***. That is,* <sup>λ</sup>**<sup>u</sup>** ◦ <sup>h</sup> <sup>=</sup> <sup>λ</sup>**<sup>v</sup>** *and if* <sup>s</sup> <sup>≤</sup>**<sup>v</sup>** <sup>s</sup>- *, then* <sup>h</sup>(s) <sup>≤</sup>**<sup>u</sup>** <sup>h</sup>(s- )*.*

Subsumption between sp-pomsets can be characterised as follows [9].

**Lemma 2.7.** *Let* sp *be restricted to* SP*. Then* sp *is the smallest precongruence (preorder monotone w.r.t. the operators) such that for all* U, V, W, X <sup>∈</sup> SP*:*

$$(U \parallel V) \cdot (W \parallel X) \sqsubseteq^{\mathfrak{sp}} (U \cdot W) \parallel (V \cdot X)$$

**CKA: syntax and semantics.** CKA terms are generated by the grammar

$$\{e, f \in \mathcal{T}(\Sigma) ::= 0 \quad | \quad 1 \mid \mathbf{a} \in \Sigma \mid \ e + f \mid \ e \cdot f \mid e \parallel f \mid e^\*\}$$

Semantics of CKA is given in terms of *pomset languages*, that is subsets of SP, which we simply denote by 2SP. Formally, the function -<sup>−</sup> : T → <sup>2</sup>SP assigning languages to CKA terms is defined as follows:

$$\begin{array}{lll} \text{[0]} = \emptyset & \text{[1]} = \{1\} & \text{[e} + f\text{]} = \text{[e]} \cup \text{[f]} & \text{[e} \cdot f\text{]} = \text{[e]} \cdot \text{[f]}\\ \text{[e}^\*\text{]} = \text{[e]}^\* & \text{[a]} = \{\text{a}\} & \text{[e]} \cdot f\text{]} = \text{[e]} \mid \text{[f]} \\ \end{array}$$

Here, we use the pointwise lifting of sequential and parallel composition from pomsets to pomset languages, i.e., when <sup>U</sup>, V ⊆ SP(Σ), we define

$$\mathcal{U} \cdot \mathcal{V} = \{ U \cdot V : U \in \mathcal{U}, V \in \mathcal{V} \} \qquad \qquad \mathcal{U} \parallel \mathcal{V} = \{ U \parallel V : U \in \mathcal{U}, V \in \mathcal{V} \}$$

Furthermore, the Kleene star of a pomset language U is defined as U<sup>∗</sup> = <sup>n</sup>∈<sup>N</sup> <sup>U</sup> <sup>n</sup>, where <sup>U</sup><sup>0</sup> <sup>=</sup> {1} and <sup>U</sup> <sup>n</sup>+1 <sup>=</sup> <sup>U</sup> <sup>n</sup> · U.

Equivalence of CKA terms can be axiomatised in the style of Kleene algebra. The relation ≡ is the smallest congruence on T (with respect to all operators) such that for all e, f, g ∈ T:

$$e+0\equiv e \qquad e+e\equiv e \qquad e+f\equiv f+e \qquad e+(f+g)\equiv (f+g)+h$$

$$e\cdot(f\cdot g)\equiv (e\cdot f)\cdot g \qquad e\cdot(f+g)\equiv e\cdot f+e\cdot h \qquad (e+f)\cdot g\equiv e\cdot g+f\cdot g$$

$$e\cdot 1\equiv e\equiv 1\cdot e \qquad e\cdot 0\equiv 0\equiv 0\cdot e \qquad e\parallel f\equiv f\parallel e \qquad e\parallel 1\equiv e \qquad e\parallel 0\equiv 0$$

$$e\parallel (f\parallel g)\equiv (e\parallel f)\parallel g \qquad e\parallel (f+g)\equiv e\parallel f+e\parallel g \qquad 1+e\cdot e^\*\equiv e^\*\equiv 1+e^\*\cdot e$$

$$e+f\cdot g\le g \implies f^\*\cdot e\le g \qquad e+f\cdot g\le f\implies e\cdot g^\*\leq f$$

in which e f is the natural order e <sup>+</sup> f <sup>≡</sup> f. The final (conditional) axioms are referred to as the *least fixpoint axioms*.

Laurence and Struth [23] proved this axiomatisation to be sound and complete. A decision procedure was proposed in [4].

**Theorem 2.8 (Soundness, completeness, decidability).** *Let* e, f ∈ T*. We have:* e <sup>≡</sup> f *if and only if* e <sup>=</sup> f*, and it is decidable whether* e <sup>=</sup> f*.*

Readers familiar with CKA will notice that the algebra defined here is not in fact CKA as defined in [11]. Indeed the signature axiom of CKA, the exchange law, has been omitted. However, as we show in Section 4.2, the standard definition of CKA, as well as its completeness proof [15], may be recovered using hypotheses.

## **3 Pomset contexts**

The linear one-dimensional structure of words makes it straightforward to define occurrences of subwords: if one wants to state that a word w appears in another word v, one can simply say that v <sup>=</sup> xwy for some x and y. Due to the twodimensional nature of pomsets, it is not straightforward to define when a pomset occurs inside another pomset, because the pomset could appear below a parallel, which is nested in a sequential, which is in a parallel, etc. In what follows we define *pomset contexts*, that will enable us to talk about pomset factorisations in a similar fashion as we do for words, and prove some useful properties for these.

**Definition 3.1.** *Let* <sup>∗</sup> *be a symbol not occurring in* Σ*. A* pomset context *is a pomset over* Σ ∪ {∗} *with exactly* one *node labelled by* <sup>∗</sup>*. More precisely,* C *is a pomset context if* <sup>C</sup> = [**c**] *with exactly one* <sup>s</sup><sup>∗</sup> <sup>∈</sup> <sup>S</sup>**<sup>c</sup>** *with* <sup>λ</sup>**<sup>c</sup>**(s<sup>∗</sup>) = <sup>∗</sup>*.*

Intuitively, ∗ is a placeholder or gap where another pomset can be inserted. We write PC(Σ) for the set of pomset contexts over Σ, and PCsp(Σ) for the series-parallel pomset contexts over Σ.

Given a C <sup>∈</sup> PC and U <sup>∈</sup> Pom, we can "plug" U into the gap left in C to obtain the pomset <sup>C</sup>[U] <sup>∈</sup> Pom. More precisely, let <sup>U</sup> = [**u**] and <sup>C</sup> = [**c**] with **<sup>u</sup>** disjoint from **<sup>c</sup>**. We write C[U] for the pomset represented by **<sup>c</sup>**[**u**], where <sup>S</sup>**<sup>c</sup>**[**u**] <sup>=</sup> <sup>S</sup>**<sup>u</sup>** <sup>∪</sup> <sup>S</sup>**<sup>c</sup>** − {∗} and <sup>λ</sup>**<sup>c</sup>**[**u**](s) is given by <sup>λ</sup>**<sup>c</sup>**(s) if <sup>s</sup> <sup>∈</sup> <sup>S</sup>**<sup>c</sup>** − {∗}, and <sup>λ</sup>**<sup>u</sup>**(s) when <sup>s</sup> <sup>∈</sup> <sup>S</sup>**<sup>u</sup>**; lastly, <sup>≤</sup>**c**[**u**] is the smallest relation on <sup>S</sup>**<sup>c</sup>**[**u**] satisfying

$$\begin{array}{llll} s \leq\_{\mathbf{u}} s' & & s \leq\_{\mathbf{c}} s'\\ \frac{s \leq\_{\mathbf{c}[\mathbf{u}]} s'}{s \leq\_{\mathbf{c}[\mathbf{u}]} s'} & & s \leq\_{\mathbf{c}[\mathbf{u}]} s' \end{array} \qquad \begin{array}{llll} s\_\* \leq\_{\mathbf{c}} s & & s' \in S\_{\mathbf{u}}\\ s' \leq\_{\mathbf{c}[\mathbf{u}]} s & & s' \leq\_{\mathbf{c}[\mathbf{u}]} s \leq\_{\mathbf{c}} s\_\*\\ s \leq\_{\mathbf{c}[\mathbf{u}]} s' \end{array}$$

It follows easily that ≤**c**[**u**] is a partial order. We may also apply contexts to languages: if L <sup>⊆</sup> Pom and C <sup>∈</sup> PC, the language C[L] is defined as {C[U] : U <sup>∈</sup> L}.

We now prove some properties of contexts that will be useful later in our technical development. First, we note that pomset contexts respect subsumption.

**Lemma 3.2.** *Let* C, D <sup>∈</sup> PC*,* U <sup>∈</sup> Pom*. If* C D*, then* C[U] D[U]*.*

Series-parallel pomset contexts can be given an inductive characterisation.

**Lemma 3.3.** PCsp *is the smallest pomset language* L *satisfying*

$$\frac{U \in \mathsf{SP} \qquad C \in L}{U \cdot C \in L} \qquad \frac{C \in L \qquad V \in \mathsf{SP}}{C \cdot V \in L} \qquad \frac{U \in \mathsf{SP} \qquad C \in L}{U \parallel C \in L}$$

We will identify *totally ordered pomsets* with words, i.e., Σ<sup>∗</sup> <sup>⊆</sup> SP. If the pomset U inserted in a context C is a non-empty word, and the resulting pomset is a parallel pomset, then we can infer how to factorise C.

**Lemma 3.4.** *Let* C <sup>∈</sup> PCsp *be a pomset context, let* V,W <sup>∈</sup> Pom*, and let* U <sup>∈</sup> Σ<sup>∗</sup> *be non-empty. If* C[U] = V - W*, then there exists a* C- <sup>∈</sup> PCsp *such that either* C <sup>=</sup> C- - W *and* C- [U] = V *, or* C <sup>=</sup> V - C *and* C- [U] = W*.*

Application of series-parallel contexts preserves series-parallel pomsets.

**Lemma 3.5.** *Let* <sup>C</sup> <sup>∈</sup> PCsp*. If* U <sup>∈</sup> SP*, then* C[U] <sup>∈</sup> SP *as well.*

If we plug the empty pomset into a context, then any subsumed pomset can be obtained by plugging the empty pomset into a subsumed context. If the subsumed pomset is series-parallel, then so is the subsumed context.

**Lemma 3.6.** *Let* <sup>C</sup> <sup>∈</sup> PC *and* <sup>V</sup> <sup>∈</sup> Pom *with* <sup>V</sup> <sup>C</sup>[1]*. We can construct* C- <sup>∈</sup> PC *such that* C- C *and* C- [1] = V *. Moreover, if* V <sup>∈</sup> SP*, then* C-<sup>∈</sup> PCsp*.*

An analogue to the previous lemma can be obtained if instead of the empty pomset one inserts a single letter pomset a.

**Lemma 3.7.** *Let* C <sup>∈</sup> PC*,* V <sup>∈</sup> Pom *and* <sup>a</sup> <sup>∈</sup> Σ *with* V C[a]*. We can construct* C- <sup>∈</sup> PC *s.t.* C- C *and* C- [a] = V *. Moreover, if* V <sup>∈</sup> SP*, then* C-<sup>∈</sup> PCsp*.*

## **4 Concurrent Kleene Algebra with Hypotheses**

Kleene algebra has basic axioms about how program composition operators should work in general, and hence does not make any assumptions about how these operators work on specific programs. When reasoning about equivalence in a programming language, however, it makes sense to embed domain-specific truths about the operators into the axioms. For instance, if a programming language includes assignments to variables, then subsequent assignments to the same variable could be merged into one, giving rise to an equation such as

$$x \leftarrow m \le x \leftarrow n \cdot x \leftarrow m,\tag{1}$$

which says that the behaviour of first assigning n, then m to x (on the right) includes the behaviour of simply assigning m to x directly (on the left).

Kleene algebra with hypotheses (KAH) [5,18,20,7] enables the addition of extra axioms, called *hypotheses*, to the axioms of KA. The appeal of KAH is that it allows a wide range of such hypotheses about programs to be added to the equational theory, while retaining the theoretical boilerplate of KA. In particular, it turns out that we can derive a sound model for any set of hypotheses, using the language model that is sound for KA proper [7]. Moreover, the completeness and decidability results that hold for KA can be leveraged to obtain completeness and decidability results for some specific types of hypotheses [5,20,7]; in general, equivalence under other hypotheses may turn out to be undecidable [18].

In this section, we propose a generalisation of so-called Kleene algebra with hypotheses to a concurrent setting, showing how one can obtain a sound (pomset language) model for any set of hypotheses. We then discuss a number of techniques that allow one to prove completeness and decidability of the resulting system for a large set of hypotheses, by relying on analogous results about CKA.

**Definition 4.1.** *<sup>A</sup>* hypothesis *is an inequation* e <sup>≤</sup> f *where* e, f ∈ T*. When* H *is a set of hypotheses, we write* <sup>≡</sup><sup>H</sup> *for the smallest congruence on* <sup>T</sup> *generated by the hypotheses in* H *as well as the axioms and implications that build* <sup>≡</sup>*. More concretely, whenever* e <sup>≤</sup> f <sup>∈</sup> H*, also* e -H f*.*

A hypothesis that declares two programs to be equivalent, such as in (1), can be encoded by including both e <sup>≤</sup> f and f <sup>≤</sup> e in H.

*Example 4.2.* Suppose the set of primitive actions Σ includes the increments of the form incr x, as well as a statement print, which writes the complete state of the machine (including variables) on the standard output. Since we would like to depict the state consistently, the state should not change while the output is rendered; hence, print cannot be executed concurrently with any other action. Instead, when a program containing print is scheduled to run in parallel with an assignment, it must be interleaved such that the assignment runs either entirely before or after print. To encode this, we can include in H the hypotheses

$$\|\mathbf{incr}\,x\|\,\mathbf{print} = \mathbf{incr}\,x \cdot \mathbf{print} + \mathbf{print} \cdot \mathbf{incr}\,x$$

for all variables x. This allows us to prove, for instance, that

$$\left(\mathsf{print} \cdot \mathsf{incr} \, x \cdot \mathsf{incr} \, x \cdot \mathsf{print} \, \stackrel{\leq^H}{=} \left(\mathsf{incr} \, x \parallel \mathsf{print}\right)^\*\right)$$

That is, if we run some number of increments and print statements in parallel, it is possible that x is incremented twice between print statements.

To obtain a model of CKAH, it is not enough to use -<sup>−</sup>, as some programs equated by the hypotheses might have different semantics. To get around this, we adapt the method from [7]: take -<sup>−</sup> as a base semantics, and adapt the resulting language using hypotheses, such that the pomsets that could be obtained by rearranging the term using the hypotheses are also present in the language:

**Definition 4.3.** *Let* L <sup>⊆</sup> Pom*. We define the* H*-closure of* L*, written* L↓<sup>H</sup>*, as the smallest language containing* L *such that for all* e <sup>≤</sup> f <sup>∈</sup> H *and* C <sup>∈</sup> PCsp*, if* C[f] <sup>⊆</sup> L↓<sup>H</sup>*, then* C[e] <sup>⊆</sup> L↓<sup>H</sup>*. Formally,* L↓<sup>H</sup> *may be described as the smallest language satisfying the following inference rules:*

$$\overline{L\subseteq L\downarrow^{H}}\qquad\qquad\frac{e\leq f\in H\qquad C\in \mathsf{PC}^{\mathsf{sp}}\qquad C[[f]]\subseteq L\downarrow^{H}}{C[[e]]\subseteq L\downarrow^{H}}$$

*Example 4.4.* Continuing with H and Σ as in the previous examples, note that if L <sup>=</sup> incr x print, then incr x print <sup>∈</sup> L↓<sup>H</sup>. Choose C <sup>=</sup> <sup>∗</sup>; we have C[incr x · print] = incr x · print. Because incr x · print <sup>+</sup> print · incr x <sup>≤</sup> incr x print <sup>∈</sup> H and for all <sup>U</sup> <sup>∈</sup> incr x print we have C[U] <sup>∈</sup> L <sup>⊆</sup> L↓<sup>H</sup>, we get C[incr x · print] <sup>∈</sup> L↓<sup>H</sup> and therefore incr x · print <sup>∈</sup> L↓<sup>H</sup>.

We observe the following useful properties about the interaction between closure and other operators on pomset languages.

**Lemma 4.5.** *Let* L, K <sup>⊆</sup> Pom *and* C <sup>∈</sup> PCsp*. The following hold. 1.* <sup>L</sup> <sup>⊆</sup> <sup>K</sup>↓<sup>H</sup> *iff* L↓<sup>H</sup> <sup>⊆</sup> K↓H*. 2. If* <sup>L</sup> <sup>⊆</sup> <sup>K</sup>*, then* <sup>L</sup>↓<sup>H</sup> <sup>⊆</sup> K↓H*. 3.* (L <sup>∪</sup> K) <sup>↓</sup><sup>H</sup> <sup>=</sup> L↓<sup>H</sup> <sup>∪</sup> K↓<sup>H</sup> ↓H *4.* (L · K) <sup>↓</sup><sup>H</sup> <sup>=</sup> L↓<sup>H</sup> · K↓<sup>H</sup> ↓H *5.* (L - K) <sup>↓</sup><sup>H</sup> <sup>=</sup> L↓<sup>H</sup> - K↓<sup>H</sup> ↓H *6.* (L∗) <sup>↓</sup><sup>H</sup> = ( L↓<sup>H</sup><sup>∗</sup> )↓<sup>H</sup> *7. If* L↓<sup>H</sup> <sup>⊆</sup> K↓H*, then* C[L]↓<sup>H</sup> <sup>⊆</sup> C[K]↓H*. 8. If* L <sup>⊆</sup> SP*, then* L↓<sup>H</sup> <sup>⊆</sup> SP*.*

*Remark 4.6.* Property (1) states that −↓<sup>H</sup> is a closure operator. However, it is not in general a Kuratowski closure operator [22], since it fails to commute with union. For instance, let <sup>a</sup>, <sup>b</sup>, <sup>c</sup> <sup>∈</sup> Σ and H <sup>=</sup> {<sup>a</sup> <sup>≤</sup> <sup>b</sup> <sup>+</sup> <sup>c</sup>}; then {b}↓<sup>H</sup> ∪ {c}↓<sup>H</sup> <sup>=</sup> {b, <sup>c</sup>}, while <sup>a</sup> <sup>∈</sup> ({b}∪{c}) <sup>↓</sup><sup>H</sup>.

Using Lemma 4.5, we can show that, if we combine the semantics from -− with H-closure, we obtain a sound semantics for CKA with hypotheses H.

#### **Lemma 4.7 (Soundness).** *If* <sup>e</sup> <sup>≡</sup><sup>H</sup> f*, then* e↓<sup>H</sup> <sup>=</sup> f↓<sup>H</sup>*.*

The converse of the above, where semantic equivalence is sufficient to establish axiomatic equivalence, is called *completeness*. Similarly, we may also be interested in *deciding* whether e↓<sup>H</sup> and f↓<sup>H</sup> coincide.

**Definition 4.8.** *Let* e, f ∈ T*.*

*(i) If* e↓<sup>H</sup> <sup>=</sup> f↓<sup>H</sup> *implies* e <sup>≡</sup><sup>H</sup> f*, then* H *is called* complete*. (ii) If* <sup>e</sup>↓<sup>H</sup> <sup>=</sup> f↓<sup>H</sup> *is decidable, then* H *is said to be* decidable*.*

Note that, in the special case where H <sup>=</sup> <sup>∅</sup>, we know that H is complete and decidable by Theorem 2.8. One method to find out whether H is complete or decidable is to reduce the problem to this special case. More concretely, suppose we know e↓<sup>H</sup> <sup>=</sup> f↓<sup>H</sup>, and want to establish that e <sup>≡</sup><sup>H</sup> f. If we could find a set of hypotheses H that is complete, and we could map e and f to terms r(e) and r(f) such that r(e)↓<sup>H</sup>- = r(f)↓<sup>H</sup>- , then we would have r(e) <sup>≡</sup><sup>H</sup>- r(f). If we could then "lift" that equivalence to prove e <sup>≡</sup><sup>H</sup> f, we are done. Similarly, if we would know that r(e)↓<sup>H</sup>- = r(f)↓<sup>H</sup>- is equivalent to e↓<sup>H</sup> <sup>=</sup> f↓<sup>H</sup>, we could decide the latter. To formalise this intuition, we first need the following.

**Definition 4.9.** *We say that* H implies H *if we can use the hypotheses in* H *to prove those of* H- *, i.e., if for every hypothesis* e <sup>≤</sup> f <sup>∈</sup> H *it holds that* e -H f*.*

Implication relates to equivalence and closure as follows.

**Lemma 4.10.** *Let* H *and* H *be sets of hypotheses such that* H *implies* H- *.*

*(i) If* e, f ∈ T *with* e <sup>≡</sup><sup>H</sup>- f*, then* e <sup>≡</sup><sup>H</sup> f*. (ii) If* L <sup>⊆</sup> Pom*, then* L↓<sup>H</sup>- <sup>⊆</sup> L↓<sup>H</sup>*. (iii) If* L <sup>⊆</sup> Pom*, then* (L↓<sup>H</sup>- )↓<sup>H</sup> <sup>=</sup> L↓<sup>H</sup>*.*

If H implies H and vice versa, then H is complete (resp. decidable) precisely when <sup>H</sup> is. In general, however, this is not very helpful; we need something more asymmetrical, in order to get from a complicated set of hypotheses H to a simpler set of hypotheses <sup>H</sup>- , where completeness or decidability might be easier to prove. Ideally, we would like to reduce to H-= ∅, which is complete and decidable.

One idea to formalise this idea of a reduction is as follows.

**Definition 4.11.** *Let* H *and* H *be sets of hypotheses such that* H *implies* H- *. A map* r : T→T *is a* reduction *from* H *to* H*when both of the following are true:*


*We call* H reducible *to* H *if there exists a reduction from* H *to* H- *.*

It is straightforward to show that reductions do indeed carry over completeness and decidability results, in the following sense.

**Lemma 4.12.** *Suppose* H *is reducible to* H- *. If* H *is complete (respectively decidable), then so is* H*.*

*Example 4.13.* Let Σ <sup>=</sup> {a, <sup>b</sup>}. Let H <sup>=</sup> {<sup>a</sup> <sup>≤</sup> <sup>b</sup>}. We can define for e ∈ T the term r(e) ∈ T, which is e but with every occurrence of <sup>b</sup> replaced by <sup>a</sup> <sup>+</sup> <sup>b</sup>. For instance, r(<sup>a</sup> · <sup>b</sup><sup>∗</sup> c) = a · (a + b) ∗ c. An inductive argument on the structure of e shows that r reduces H to <sup>∅</sup>, and hence H is complete and decidable.

It is not very hard to show that reductions can be chained, as follows.

**Lemma 4.14.** *If* H *reduces to* H- *, which reduces to* H--*, then* H *reduces to* H--*.*

Another way of reducing <sup>H</sup> is to find two sets of hypotheses <sup>H</sup><sup>0</sup> and <sup>H</sup>1, and reduce each of those to another set of hypotheses <sup>H</sup>- [7]. The idea is that a proof of e <sup>≡</sup><sup>H</sup> f can be split up in a phase where we find e- , f- ∈ T such that e <sup>≡</sup><sup>H</sup><sup>0</sup> e- and f <sup>≡</sup><sup>H</sup><sup>0</sup> f- , after which we find e--, f-- ∈ T with e- <sup>≡</sup><sup>H</sup><sup>1</sup> e- and f- <sup>≡</sup><sup>H</sup><sup>1</sup> f--. Finally, we establish that <sup>e</sup>-- <sup>≡</sup><sup>H</sup>- f--, before lifting those equivalences to H, concluding

$$e \equiv^{H} e' \equiv^{H} e'' \equiv^{H} f'' \equiv^{H} f' \equiv^{H} f'$$

One way of achieving this is as follows.

**Definition 4.15.** *We say that* <sup>H</sup> factorises *into* <sup>H</sup><sup>0</sup> *and* <sup>H</sup><sup>1</sup> *if* <sup>H</sup> *implies both* <sup>H</sup><sup>0</sup> *and* <sup>H</sup><sup>1</sup>*, and for all* <sup>L</sup> <sup>⊆</sup> SP *we have that* <sup>L</sup>↓<sup>H</sup> = (L↓<sup>H</sup><sup>0</sup> )↓<sup>H</sup><sup>1</sup> *.*

In order to use factorisation to compose simpler reductions into more complicated ones, we need a slightly stronger notion of reduction, as follows.

**Definition 4.16.** *We say that* r *is a* strong reduction *from* H *to* H *if it is a reduction such that for* e ∈ T*, it holds that* e↓<sup>H</sup> <sup>=</sup> r(e)↓<sup>H</sup>- *.*

Note that this additional condition essentially strengthens the second condition in Definition 4.11. Factorisation then lets us compose strong reductions.

**Lemma 4.17.** *Suppose* <sup>H</sup> *factorises into* <sup>H</sup><sup>0</sup> *and* <sup>H</sup><sup>1</sup>*, and both* <sup>H</sup><sup>0</sup> *and* <sup>H</sup><sup>1</sup> *strongly reduce to* H- *. Then* H *strongly reduces to* H- *.*

The remainder of this section is devoted to developing techniques that can be used to design reductions, based on the properties of the sets of hypotheses under consideration. Using the lemmas we have established so far, these techniques may then be leveraged to obtain completeness and decidability results.

#### **4.1 Reification**

It can happen that the hypotheses in H impose an algebraic structure on the letters in Σ; for instance, as we will see later on, the letters in H could be propositional terms, whose equivalence is mediated by the axioms of Boolean algebra. In order to peel away this layer of axioms and reduce to a smaller H- , we can try to reduce to terms over a smaller alphabet, making the algebraic structure on the letters irrelevant to equivalence. In a sense, performing this kind of reduction is like showing that the equivalences between letters from the hypotheses can already be guaranteed by replacing them with the right terms.

*Example 4.18.* Let <sup>Σ</sup> be the set of group terms over a (finite) alphabet <sup>Λ</sup>, that is, Σ consists of the terms generated by the grammar g, h ::= u <sup>|</sup> <sup>a</sup> <sup>∈</sup> Λ <sup>|</sup> g ◦h <sup>|</sup> g. Furthermore, let ≡<sup>G</sup> be the smallest congruence generated by the group axioms, i.e., for all g, h, i <sup>∈</sup> Λ it holds that

$$g \circ (h \circ i) \equiv\_G (g \circ h) \circ i \qquad \quad g \circ u \equiv\_G g \equiv\_G u \circ g \qquad \quad \overline{g} \circ g \equiv\_G u \equiv\_G g \circ \overline{g}$$

Lastly, let group <sup>=</sup> {<sup>g</sup> <sup>≤</sup> <sup>h</sup> : <sup>g</sup> <sup>≡</sup><sup>G</sup> <sup>h</sup>}. We can then define a reduction from group to <sup>∅</sup> by replacing every letter (group term) in a term e with its reduced form, that is, with the (unique) equivalent group term of minimum size. For instance, if Λ <sup>=</sup> {a, <sup>b</sup>, <sup>c</sup>}, then we send the term <sup>a</sup> ◦ <sup>a</sup> <sup>b</sup> ◦ <sup>c</sup> ◦ <sup>c</sup> to the term u b.

For the remainder of this section, we fix a subalphabet Γ <sup>⊆</sup> Σ. When r : Σ → T(Γ), we extend r to a map from <sup>T</sup>(Σ) to <sup>T</sup>(Γ), by inductively applying r to terms. We can also apply r to a series-parallel pomset, obtaining a pomset language. More precisely, when U is a pomset, we define r(U) as follows:

$$r(1) = \{1\} \quad r(U \cdot V) = r(U) \cdot r(V) \quad r(\mathbf{a}) = \{r(\mathbf{a})\} \quad r(U \parallel V) = r(U) \parallel r(V)$$

Lastly, when L <sup>⊆</sup> SP, we write r(L) for the set {r(U) : U <sup>∈</sup> L}.

The following then formalises the idea of reducing by replacing letters.

**Definition 4.19.** *A map* r : Σ → T(Γ) *is a* reification *from* H *to* H*if*


*Example 4.20.* Continuing with the previous example, let r be the map that sends a group term to its reduced form; we claim that <sup>r</sup> is a reification from group to <sup>∅</sup>. By definition, we then know that for a group term g <sup>∈</sup> Σ, we have <sup>r</sup>(g) <sup>≡</sup><sup>G</sup> <sup>g</sup>, and hence <sup>r</sup>(g) <sup>≡</sup>group <sup>g</sup>. Furthermore, the reduction of a reduced term is that term itself; hence, the second condition is satisfied. The third condition holds trivially. Lastly, if <sup>e</sup> <sup>≤</sup> <sup>f</sup> <sup>∈</sup> group, then e, f <sup>∈</sup> <sup>Σ</sup> such that <sup>e</sup> <sup>≡</sup><sup>G</sup> <sup>f</sup>. Since reductions are unique, we then know that r(e) = r(f), and hence r(e) -<sup>∅</sup> r(f).

We have the following general properties of a map r, which we will use in demonstrating how to obtain a reduction from a reification.

## **Lemma 4.21.** *Let* r : Σ → T *be some map.*


The following technical lemma is a consequence of property (iv).

**Lemma 4.22.** *If* <sup>r</sup> *is a reification and* <sup>L</sup> <sup>⊆</sup> SP(Σ)*, then* <sup>r</sup>(L↓<sup>H</sup>) <sup>⊆</sup> r(L)↓<sup>H</sup>- *.*

Using this, we can then show how to obtain a reduction from a reification.

**Lemma 4.23.** *If* H *implies* H *and* r *is a reification from* H *to* H- *, then* r *is a reduction from* H *to* H- *.*

*Proof.* The first condition, i.e., that for e ∈ T we have e <sup>≡</sup><sup>H</sup> r(e), can be checked using the first property of reification by induction on the structure of e. It thus remains to check the second condition; we do this by proving that for all e ∈ T(Σ) we have r e↓<sup>H</sup> = r(e)↓<sup>H</sup>- . To this end, we derive as follows:

$$\begin{aligned} r(\lbrack e \rbrack \downarrow^{H}) & \subseteq r(\lbrack e \rbrack) \downarrow^{H'} & \text{(Lemma 4.22)}\\ &= \lbrack r(e) \rbrack \downarrow^{H'} & \text{(Lemma 4.21(iii))}\\ &\subseteq r(\lbrack r(e) \rbrack \downarrow^{H'}) & \text{(property (ii))}\\ &\subseteq r(\lbrack r(e) \rbrack \downarrow^{H}) & \text{(Lemma 4.10(ii))}\\ &= r(\lbrack e \rbrack \downarrow^{H}) & \text{(property (i), soundness)} \end{aligned}$$

Specifically, in the third step, property (ii) ensures that for L <sup>⊆</sup> SP(Γ) we have <sup>L</sup> <sup>⊆</sup> <sup>r</sup>(L). We can use this property because H- -closure preserves the Γ-language by property (iii). This completes the proof.

### **4.2 Factoring the exchange law**

In the basic axioms that generate ≡, there is no interaction between sequential and parallel composition. One sensible way of adding that kind of interaction is, as suggested by Hoare, Struth and collaborators [11], by adding an axiom of the form (e f) · (g h) - (e · g) -(f · h), known as the *exchange law*. Essentially, this axiom encodes the possibility of (partial) interleaving: when e · g runs in parallel with f · h, one possible behaviour is that, first e runs in parallel with f, and then g runs in parallel with h. The core observation of this section is that the exchange law can be treated as another set of hypotheses, as we show below, and this can then be used to recover the completeness result of CKA [15].

**Definition 4.24.** *We write* exch *for the set*

{(e f) · (g h) <sup>≤</sup> (e · g) -(f · h) : e, f, g, h ∈ T}

The semantic effect of adding exch to our hypotheses is that, if U is a pomset in a series-parallel language L, and V is a series-parallel pomset subsumed by U, then V is in the exch-closure of L. Intuitively, the exch-closure adds pomsets that are more sequential, i.e., have more ordering, than the ones already in L. Indeed, exch-closure coincides with the downward closure w.r.t. sp.

**Lemma 4.25.** *Let* L <sup>⊆</sup> SP *and* U <sup>∈</sup> SP*. Now* U <sup>∈</sup> L↓exch *if and only if there exists a* V <sup>∈</sup> L *such that* U sp V *.*

We have previously shown that exch is complete [15]; as a matter of fact, the pivotal result from op. cit. can be presented as follows.

**Theorem 4.26.** *The set of hypotheses* exch *is strongly reducible to* ∅*.*

When exch is contained in our hypotheses, it is not immediately clear whether those hypotheses can be reduced. What we can do is try to factorise our hypotheses into exch and some residual set of hypotheses, and prove strong reducibility for that residual set. To this end, we first note that, in some circumstances, the H-closure of the exch-closure remains downward-closed w.r.t. sp.

**Lemma 4.27.** *Suppose that for each* e <sup>≤</sup> f <sup>∈</sup> H *we have that* e = 1 *or* e <sup>=</sup> <sup>a</sup> *for some* <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>*, and let* <sup>L</sup> <sup>⊆</sup> SP*. If* U, V <sup>∈</sup> SP *such that* <sup>U</sup> sp <sup>V</sup> *and* <sup>V</sup> <sup>∈</sup> (L↓exch)↓<sup>H</sup>*, then* U <sup>∈</sup> (L↓exch)↓<sup>H</sup>*.*

Using this fact, we can now show that, under the same precondition, exch<sup>∪</sup> H factors into exch and H. This factorisation is what we were looking for: it tells us that whenever H strongly reduces to <sup>∅</sup>, so does H <sup>∪</sup> exch.

**Lemma 4.28.** *Suppose that for each* e <sup>≤</sup> f <sup>∈</sup> H *we have that* e = 1*, or* e <sup>=</sup> <sup>a</sup> *for some* <sup>a</sup> <sup>∈</sup> Σ*. Then* H <sup>∪</sup> exch *factorises into* exch *and* H*.*

*Proof.* Since H, exch <sup>⊆</sup> H <sup>∪</sup>exch, it should be obvious that H <sup>∪</sup>exch implies both H and exch. It remains to show that, if L <sup>⊆</sup> SP, then (L↓exch)↓<sup>H</sup> <sup>=</sup> L↓<sup>H</sup>∪exch. The inclusion from left to right is a consequence of Lemma 4.10(ii)–(iii).

For the other inclusion, we show that if A <sup>⊆</sup> L↓<sup>H</sup>∪exch, then A <sup>⊆</sup> (L↓exch)↓<sup>H</sup>. The proof proceeds by induction on the construction of A <sup>⊆</sup> L↓<sup>H</sup>∪exch. In the base, we have that <sup>A</sup> <sup>⊆</sup> <sup>L</sup>↓<sup>H</sup>∪exch because <sup>A</sup> <sup>=</sup> <sup>L</sup>; in that case, <sup>A</sup> <sup>⊆</sup> <sup>L</sup>↓exch <sup>⊆</sup> (L↓exch)↓<sup>H</sup>.

For the inductive step, A <sup>⊆</sup> L↓<sup>H</sup>∪exch because there exist e <sup>≤</sup> f <sup>∈</sup> H <sup>∪</sup> exch and C <sup>∈</sup> PCsp such that A <sup>=</sup> C[e], and C[f] <sup>⊆</sup> L↓<sup>H</sup>∪exch. By induction, we then know that <sup>C</sup>[f] <sup>⊆</sup> L↓exch <sup>↓</sup><sup>H</sup>. On the one hand, if e <sup>≤</sup> f <sup>∈</sup> H, then A <sup>=</sup> C[e] <sup>⊆</sup> L↓exch <sup>↓</sup><sup>H</sup> immediately. On the other hand, if e <sup>≤</sup> f <sup>∈</sup> exch, then e sp f, and hence C[e] sp C[f] by Lemma 3.2. By Lemma 3.5 and Lemma 4.27, it then follows that A <sup>=</sup> C[e] <sup>⊆</sup> (L↓exch)↓<sup>H</sup>.

## **4.3 Lifting**

A number of reduction procedures already exist at the level of Kleene algebra [20,7]; ideally, one would like to lift those procedures to CKA.

*Example 4.29.* The reductions in Example 4.13 and Example 4.18 worked out for terms without -, and then extended inductively, by defining the reduction of f to be the parallel composition of the reductions of e and f respectively.

e -As a non-example, consider H <sup>=</sup> {<sup>a</sup> <sup>≤</sup> <sup>1</sup>}. Even though this hypothesis can be reduced to ∅ within Kleene algebra [5], it is not obvious how this would work for pomset languages. In particular, if 1 <sup>∈</sup> L, then 1 -··· - <sup>1</sup> <sup>∈</sup> L for any number of 1's, and hence a - ··· <sup>a</sup> <sup>∈</sup> L↓<sup>H</sup> for any number of <sup>a</sup>'s. This precludes the possibility of a strong reduction to <sup>∅</sup>, because -<sup>1</sup>↓<sup>H</sup> is a pomset language of unbounded (parallel) width, which cannot be expressed by any e ∈ T [25].

We now establish a set of sufficient conditions for such a lifting to work. To this end, we first formally define Kleene algebra syntax, axioms and semantics.

**Definition 4.30.** *Write* TKA *for the set of* Kleene algebra terms*, i.e., the terms in* T *that do not contain* -*. Furthermore, we write* ≡KA *for the smallest congruence on* TKA *that is generated by the axioms of* ≡ *that do not involve* -*.*

When e ∈ TKA, it is not hard to see that e contains totally ordered pomsets, i.e., words, exclusively. Using these definitions, we can now specialise the notions of hypotheses, context, and closure to the sequential setting, as follows.

**Definition 4.31.** *The relation* <sup>≡</sup><sup>H</sup> KA *is generated from* <sup>H</sup> *and* <sup>≡</sup>KA *as before. A context* C <sup>∈</sup> PCsp *is* sequential *if it is totally ordered, i.e., if it is a word with one occurrence of* <sup>∗</sup>*; we write* PCseq *for the set of sequential contexts.*

*Given a set of hypotheses* H *and a language* L <sup>⊆</sup> Σ<sup>∗</sup>*, we define the* sequential closure *of* L *with respect to* H*, written* L↓<sup>H</sup> seq*, as the least language containing* L *such that for all* e <sup>≤</sup> f <sup>∈</sup> H *and* C <sup>∈</sup> PCseq*, if* C[f] <sup>⊆</sup> L↓<sup>H</sup> seq*, then* C[e] <sup>⊆</sup> L↓<sup>H</sup> seq*.*

If does not occur in any hypothesis, then the definition of sequential closure coincides with the closure operator from [7]. Thus, if L <sup>⊆</sup> Σ<sup>∗</sup>, then L↓<sup>H</sup> seq <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup>.

The analogue of strong reduction for the sequential setting is as follows.

**Definition 4.32.** *Suppose that* H *implies* H- *. A map* <sup>r</sup> : <sup>T</sup>KA → TKA *is a* sequential reduction *from* H *to* H*when the following hold:*


<sup>H</sup> sequentially reduces *to* H *if there exists a sequential reduction from* H *to* H- *.*

To lift a sequential reduction to a proper reduction, the following class of hypotheses will turn out to be useful.

**Definition 4.33.** *A hypothesis* <sup>e</sup> <sup>≤</sup> <sup>f</sup> *with* e, f ∈ TKA *is called* grounded *if* f <sup>=</sup> {W} *for some non-empty word (totally ordered pomset)* W*, and* e ∈ TKA*. We say that a set of hypotheses* H *is grounded if every* e <sup>≤</sup> f <sup>∈</sup> H *is grounded.*

*Example 4.34.* Any hypothesis of the form <sup>e</sup> <sup>≤</sup> <sup>a</sup><sup>1</sup> ··· <sup>a</sup><sup>n</sup> for n > 0 is grounded. On the other hand, the hypothesis a ≤ 1 that we saw in the previous example is not grounded, since the semantics of 1 contains the empty pomset.

The closure of a language of words can be expressed in terms of its sequential closure, provided that the set of hypotheses is grounded.

**Lemma 4.35.** *Let* <sup>H</sup> *be grounded. If* <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup>∗*, then* <sup>L</sup>↓<sup>H</sup> <sup>=</sup> <sup>L</sup>↓<sup>H</sup> seq*. Moreover, for* L, L- <sup>⊆</sup> SP*, we have that* (L - L- ) <sup>↓</sup><sup>H</sup> <sup>=</sup> L↓<sup>H</sup> - L- <sup>↓</sup>H*.*

The above then allows us to turn a sequential reduction into a reduction.

**Lemma 4.36.** *Suppose that* <sup>H</sup> *sequentially reduces to* <sup>H</sup>- *. If* H *and* H *are grounded, then* H *strongly reduces to* H- *.*

## **5 Instantiation to CKA with Observations**

In this section, we will present Concurrent Kleene Algebra with Observations (CKAO), an extension of CKA with Boolean assertions that enable the specification of programs with the usual guarded conditionals and loops. We will obtain CKAO as an instance of CKAH by choosing a particular set of hypotheses. First, we define the set of propositional terms or Boolean observations.

**Definition 5.1.** *Fix a finite set* <sup>Ω</sup> *of* primitive observations*. The set of* propositional terms*, written* TBA*, is generated by*

$$p, q ::= \bot \mid \top \mid o \in \mathcal{Q} \mid p \lor q \mid p \land q \mid \overline{p}$$

*The relation* <sup>≡</sup>BA *is the smallest congruence on* <sup>T</sup>BA *s.t. for* p, q, r ∈ TBA*, we have* <sup>p</sup> ∨⊥≡BA p p <sup>∨</sup> <sup>q</sup> <sup>≡</sup>BA <sup>q</sup> <sup>∨</sup> p p <sup>∨</sup> <sup>p</sup> <sup>≡</sup>BA <sup>p</sup> <sup>∨</sup> (<sup>q</sup> <sup>∨</sup> <sup>r</sup>) <sup>≡</sup>BA (<sup>p</sup> <sup>∨</sup> <sup>q</sup>) <sup>∨</sup> <sup>r</sup> <sup>p</sup> ∧≡BA p p <sup>∧</sup> <sup>q</sup> <sup>≡</sup>BA <sup>q</sup> <sup>∧</sup> p p <sup>∧</sup> <sup>p</sup> <sup>≡</sup>BA <sup>⊥</sup> <sup>p</sup> <sup>∧</sup> (<sup>q</sup> <sup>∧</sup> <sup>r</sup>) <sup>≡</sup>BA (<sup>p</sup> <sup>∧</sup> <sup>q</sup>) <sup>∧</sup> <sup>r</sup>

$$p \lor (q \land r) \equiv\_{\mathsf{BA}} (p \lor q) \land (p \lor r) \qquad \qquad p \land (q \lor r) \equiv\_{\mathsf{BA}} (p \land q) \lor (p \land r)$$

*We will write* p -BA <sup>q</sup> *as a shorthand for* <sup>p</sup> <sup>∨</sup> <sup>q</sup> <sup>≡</sup>BA <sup>q</sup>*.*

We write At for 2Ω, the set of *atoms* of the Boolean algebra. It is well known that every <sup>α</sup> <sup>∈</sup> At corresponds canonically to a Boolean term <sup>π</sup><sup>α</sup>, such that every Boolean term <sup>p</sup> ∈ TBA is equivalent to the disjunction of all <sup>π</sup><sup>α</sup> with <sup>π</sup><sup>α</sup> -BA <sup>p</sup> [2]. To simplify notation we identify α <sup>∈</sup> At with π<sup>α</sup>.

We can now use TBA in defining the terms and axioms of CKAO, which will be given as a CKA over a specific alphabet with the following hypotheses:

**Definition 5.2 (CKAO).** *We define the* terms *of CKAO, denoted* TCKAO*, as* <sup>T</sup>(<sup>Σ</sup> ∪ TBA)*, that is, as the CKA terms over* <sup>T</sup>BA <sup>∪</sup> <sup>Σ</sup>*. We furthermore define the following set of hypotheses over* TCKAO*:*

$$\mathsf{bool} = \{ p = q : p, q \in \mathcal{T}\_{\mathsf{BA}} \text{ s.t. } p \equiv\_{\mathsf{BA}} q \} \qquad \mathsf{contr} = \{ p \land q \le p \cdot q : p, q \in \mathcal{T}\_{\mathsf{BA}} \}$$

glue <sup>=</sup> {0 = ⊥} ∪ {<sup>p</sup> <sup>+</sup> <sup>q</sup> <sup>=</sup> <sup>p</sup> <sup>∨</sup> <sup>q</sup> : p, q ∈ TBA} obs <sup>=</sup> bool <sup>∪</sup> contr <sup>∪</sup> exch <sup>∪</sup> glue *The* semantics *of CKAO is then given by* -<sup>−</sup>↓obs*.*

The hypotheses bool contain the boolean identities, and glue identifies the disjunction with the union (and their respective units as well). contr specifies that if p and q hold simultaneously, then it is possible to observe them in sequence. Note that the converse inequality is not included: observing p and q in sequence has strictly more behaviour than observing p and q simultaneously, as some intervening action can happen between the two observations.

The above definition gives us the semantics of CKAO as the standard pomset language model obtained from taking the obs-closure of the semantics of CKA. As a matter of fact, we find by Lemma 4.7 that if e, f ∈ TCKAO with <sup>e</sup> <sup>≡</sup>obs <sup>f</sup>, then e↓obs <sup>=</sup> f↓obs; hence, we already have a sound model of CKAO.

To prove completeness, we will use the techniques from the previous section.

*First step: reification.* We start by using reification to rid ourselves of the hypotheses from bool and glue, and to simplify the hypotheses in contr. To this end, let contr be the set of hypotheses given by {α <sup>≤</sup> α · α : α <sup>∈</sup> At}. Let <sup>Γ</sup> <sup>=</sup> At <sup>∪</sup> <sup>Σ</sup> ⊆ TBA <sup>∪</sup> <sup>Σ</sup>. We define <sup>r</sup> : <sup>Σ</sup> ∪ TBA → T(Γ) by setting

$$r(a) = \begin{cases} \sum\_{\alpha \le\_{\text{BAP}} \alpha} \alpha & a = p \in \mathcal{T}\_{\text{BA}} \\ \mathbf{a} & a = \mathbf{a} \in \Sigma \end{cases}$$

**Lemma 5.3.** *The hypotheses* obs *reduce to* exch ∪ contr- *.*

*Proof.* By Lemma 4.23, it suffices to show that <sup>r</sup> is a reification, and that obs implies exch ∪ contr- . To see that r is a reification, we check the conditions.

(i): If <sup>a</sup> <sup>∈</sup> Σ, then r(a) = <sup>a</sup> <sup>≡</sup>obs <sup>a</sup> immediately. Otherwise, if p ∈ TBA, then we derive r(p) = α-BA<sup>p</sup> <sup>α</sup> <sup>≡</sup>glue α-BA<sup>p</sup> <sup>α</sup> <sup>≡</sup>bool <sup>p</sup> and hence <sup>r</sup>(p) <sup>≡</sup>obs <sup>p</sup>.

(ii): If <sup>a</sup> <sup>∈</sup> Σ, then we already know that r(a) = <sup>a</sup>. Otherwise, if α <sup>∈</sup> At, then

$$r(\alpha) = \sum\_{\beta \le\_{\mathsf{BA}} \alpha} \beta = \alpha$$

(iii): This property holds because all hypotheses in exch ∪ contr preserve Γ-languages, i.e., if e <sup>≤</sup> f <sup>∈</sup> exch <sup>∪</sup> contr where f <sup>⊆</sup> SP(Γ), then e <sup>⊆</sup> SP(Γ) too. It follows that exch ∪ contr- -closure must preserve Γ-languages.

(iv): We should show that if e <sup>≤</sup> f <sup>∈</sup> obs, then r(e) exch∪contr- r(f). To this end, we analyse the separate sets of hypotheses that make up obs.

**–** Let <sup>e</sup> <sup>≤</sup> <sup>f</sup> <sup>∈</sup> exch, then <sup>e</sup> = (g<sup>00</sup> <sup>g</sup><sup>01</sup>) · (g<sup>10</sup> <sup>g</sup><sup>11</sup>) and <sup>f</sup> = (g<sup>00</sup> · <sup>g</sup><sup>10</sup>) - (g<sup>01</sup> · <sup>g</sup><sup>11</sup>), for some <sup>g</sup><sup>00</sup>, g<sup>01</sup>, g<sup>10</sup>, g<sup>11</sup> ∈ T. We then find that

$$r(e) = (r(g\_{00}) \parallel r(g\_{01})) \cdot (r(g\_{10}) \parallel r(g\_{11}))$$

$$r(f) = (r(g\_{00}) \cdot r(g\_{10})) \parallel (r(g\_{01}) \cdot r(g\_{11}))$$

hence r(e) <sup>≤</sup> r(f) <sup>∈</sup> exch, and therefore r(e) exch∪contr- r(f). **–** Let <sup>e</sup> <sup>≤</sup> <sup>f</sup> <sup>∈</sup> bool, then <sup>e</sup> <sup>=</sup> <sup>p</sup> and <sup>f</sup> <sup>=</sup> <sup>q</sup> such that <sup>p</sup> <sup>≡</sup>BA <sup>q</sup>. In that case,

$$r(p) = \sum\_{\alpha \le\_{\mathsf{BAP}}} \alpha = \sum\_{\alpha \le\_{\mathsf{BAP}}} \alpha = r(q)$$

**–** Let e <sup>≤</sup> f <sup>∈</sup> contr; then e <sup>=</sup> p <sup>∧</sup> q and f <sup>=</sup> p · q for p, q ∈ TBA. Then

$$\begin{aligned} r(p \wedge q) &= \sum\_{\alpha \le\_{\mathsf{BAP}} \alpha \wedge q} \alpha \stackrel{\leq\_{\mathsf{contr}'}}{=} \sum\_{\alpha \le\_{\mathsf{BAP}} \alpha \wedge q} \alpha \cdot \alpha \\ &\leq \left(\sum\_{\alpha \le\_{\mathsf{BAP}}} \alpha \right) \cdot \left(\sum\_{\alpha \le\_{\mathsf{BAP}}} \alpha \right) = r(p) \cdot r(q) = r(p \cdot q) \end{aligned}$$

**–** Let e <sup>≤</sup> f <sup>∈</sup> glue. On the one hand, if e <sup>=</sup> p <sup>∨</sup> q and f <sup>=</sup> p <sup>+</sup> q, then

$$r(p \lor q) = \sum\_{\alpha \le\_{\mathsf{BAP}} \lor q} \alpha \equiv \sum\_{\alpha \le\_{\mathsf{BAP}}} \alpha + \sum\_{\alpha \le\_{\mathsf{BAP}}} \alpha = r(p) + r(q) = r(p+q)$$

This also establishes the case for f <sup>≤</sup> e <sup>∈</sup> glue. On the other hand, if e = 0 and p <sup>=</sup> <sup>⊥</sup>, then r(0) = 0 = α-BA<sup>⊥</sup> <sup>α</sup> <sup>=</sup> <sup>r</sup>(⊥).

To see that obs implies exch ∪ contr- , it suffices to show that obs implies contr- . To this end, note that if e <sup>≤</sup> f <sup>∈</sup> contr- , then e <sup>=</sup> α and f <sup>=</sup> α·α for some α <sup>∈</sup> At. We can then derive that α <sup>≡</sup>bool α <sup>∧</sup> α contr α · α, and hence e obs f.

*Second step: factorising.* Since contr satisfies the precondition of Lemma 4.28, we obtain the following.

**Lemma 5.4.** *The hypotheses* exch ∪ contr *factorise into* exch *and* contr- *.*

This means that, by Lemma 4.17 all that remains to do is strongly reduce exch and contrto ∅; we have already taken care of the former in Theorem 4.26.

*Third step: reducing* contr- *.* In [13], we have already shown that contr sequentially reduces to ∅. Since contris grounded we find the following, by Lemma 4.36.

**Lemma 5.5.** *The hypotheses* contr*strongly reduce to* ∅*.*

*Last step: putting it all together.* Using the above reductions, we can then prove completeness of <sup>≡</sup>obs w.r.t. -<sup>−</sup>↓obs, and decidability of semantic equivalence, too.

**Theorem 5.6 (Soundness and Completeness of CKAO).** *Let* e, f ∈ TCKAO*.*

*(i) We have* e <sup>≡</sup>obs f *if and only if* e↓obs <sup>=</sup> f↓obs*. (ii) It is decidable whether* <sup>e</sup>↓obs <sup>=</sup> f↓obs*.*

*Proof.* For the first claim, we already knew the implication from left to right from Lemma 4.7. Conversely, and for the second claim, first note that that obs reduces to exch∪contr by Lemma 5.3. By Lemma 5.4 and Lemma 4.17, the latter reduces to ∅, if we apply Theorem 4.26 and Lemma 5.5. By Lemma 4.12, we then conclude that obs is complete and decidable, hence establishing the claim.

## **6 Discussion**

The first contribution of this paper is to extend Kleene algebra with hypotheses [7] with a parallel operator. The resulting framework, concurrent Kleene algebra with hypotheses (CKAH), is interpreted over pomset languages, a standard model of concurrency. We start from simple axioms, known to capture equality of pomset languages [23]. CKAH allows to add custom axioms, the so-called hypotheses. These may be used to include domain-specific information in the language. We develop this framework by providing a systematic way of producing from the hypotheses a sound pomset language model. We also propose techniques that may be used to prove completeness and decidability of the resulting model.

An important instance of this framework is concurrent Kleene algebra (CKA) as presented in [11]. The only additional axiom there, known as the exchange law, may be added as a set of hypotheses. We prove that the resulting semantics coincides with the (subsumption-closed) semantics of CKA and, more interestingly, the completeness proof of [15] can be recovered as an instance of this framework.

The second contribution is a new framework to reason about programs with concurrency: concurrent Kleene algebra with observations (CKAO). CKAO is obtained as an instance of CKAH, where we add the exchange law to model concurrent behaviour, and Boolean assertions to model control flow. The Boolean assertions we consider are as in Kleene algebra with observations (KAO) [13] — in fact, CKAO is a conservative extension of KAO. Using the techniques developed earlier, we obtain a sound and complete semantics for this algebra. While CKAO is similar to concurrent Kleene algebra with tests [12], it avoids the problems of the latter by distinguishing conjunction and sequential composition. CKAO provides the first sound and complete algebraic theory that seems sensible as a framework to reason about concurrent programs with Boolean assertions.

Future work is to explore other meaningful instances of CKAH. Synchronous Kleene algebra [29,26] is a natural candidate for this. We also want to try and design domain specific languages, specifically, a concurrent variant of NetKAT [1,8].

The class of hypotheses considered in this paper for which decidability and completeness may be established systematically is somewhat restrictive; identifying larger classes of tractable hypotheses is a challenging open problem.

Because of the compositional nature of our model, the CKAO semantics of a program contains behaviours that are not possible to obtain in isolation. These behaviours are present to allow the program to interact meaningfully with its environment, i.e., when placed in a context. However, for practical purposes one might want to close the system, and only consider behaviours that are possible in isolation. Studying this semantics remains subject of future work.

In the semantics of concurrent programs with assertions, it would be natural to see atoms as partial instead of total functions. This captures the intuition that a thread might not have access to the complete machine state, but instead holds a partial view of it. Pseudo-complemented distributive lattices (PCDL) have been proposed [12] as an alternative to Boolean algebra, modelling this partiality of information. We leave it to future work to investigate the variant of CKAO obtained by replacing the Boolean algebra of observations with a PCDL.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Graded Algebraic Theories**

Satoshi Kura1,<sup>2</sup> -

<sup>1</sup> National Institute of Informatics, Tokyo, Japan <sup>2</sup> The Graduate University for Advanced Studies (SOKENDAI), Kanagawa, Japan kura@nii.ac.jp

**Abstract.** We provide graded extensions of algebraic theories and Lawvere theories that correspond to graded monads. We prove that graded algebraic theories, graded Lawvere theories, and finitary graded monads are equivalent via equivalence of categories, which extends the equivalence for monads. We also give sums and tensor products of graded algebraic theories to combine computational effects as an example of importing techniques based on algebraic theories to graded monads.

## **1 Introduction**

In the field of denotational semantics of programming languages, monads have been used to express computational effects since Moggi's seminal work [18]. They have many applications from both theoretical and practical points of view.

Monads correspond to *algebraic theories* [5]. This correspondence gives natural presentations of many kinds of computational effects by operations and equations [21], which is the basis of algebraic effect [20]. The algebraic perspective of monads also provides ways of combining [9], reasoning about [22], and handling computational effects [23].

*Graded monads* [27] are a refinement of monads and defined as a monadlike structure indexed by a monoidal category (or a preordered monoid). The unit and multiplication of graded monads are required to respect the monoidal structure. This structure enables graded monads to express some kind of "abstraction" of effectful computations. For example, graded monads are used to give denotational semantics of effect systems [12], which are type systems designed to estimate scopes of computational effects caused by programs.

This paper provides a *graded extension of algebraic theories* that corresponds to monads graded by small strict monoidal categories. This generalizes N-

$$\frac{f \in \Sigma\_{n,m} \qquad t\_i \in T\_{m'}^\Sigma(X) \text{ for each } i \in \{1, \ldots, n\}}{f(t\_1, \ldots, t\_n) \in T\_{m \otimes m'}^\Sigma(X)}$$

**Fig. 1.** A rule of term formation.

graded theories in [17]. The main ideas of this extension are the following. First, we assign to each operation a *grade*, i.e., an object in a monoidal category that represents effects. Second, our extension provides a mechanism (Fig 1) to keep track of effects in the same way as graded monads. That is, if an operation f with grade m is applied to terms with grade m , then the grade of the whole term is the product m <sup>⊗</sup> m .

For example, graded algebraic theories enable us to estimate (an overapproximation of) the set of memory locations computations may access. The sideeffects theory [21] is given by operations lookup<sup>l</sup> and updatel,v for each location l <sup>∈</sup> L and value v <sup>∈</sup> V together with several equations, and each term represents a computation with side-effects. Since lookup<sup>l</sup> and updatel,v only read from or write to the location l, we assign {l} ∈ **<sup>2</sup>**<sup>L</sup> as the grade of the operations in the graded version of the side-effects theory where **2**<sup>L</sup> is the join-semilattice of subsets of locations L. The grade of a term is (an overapproximation of) the set of memory locations the computations may access thanks to the rule in Fig 1.

We also provide *graded Lawvere theories* that correspond to graded algebraic theories. The intuition of a Lawvere theory is a category whose arrows are terms of an algebraic theory. We use this intuition to define graded Lawvere theories. In graded algebraic theories, each term has a grade, and substitution of terms must respect the monoidal structure of grades. To characterize this structure of "graded" terms, we consider Lawvere theories enriched in a presheaf category.

Like algebraic theories brought many concepts and techniques to the semantics of computational effects, we expect that the proposed graded algebraic theories will do the same for effect systems. We look into one example out of such possibilities: combining graded algebraic theories.

The main contributions of this paper are summarized as follows.


## **2 Preliminaries**

#### **2.1 Enriched Category Theory**

We review enriched category theory and introduce notations. See [13] for details.

Let **<sup>V</sup>**<sup>0</sup> = (**V**<sup>0</sup>, <sup>⊗</sup>, I) be a (not necessarily symmetric) monoidal category. **<sup>V</sup>**<sup>0</sup> is *right closed* if (−) <sup>⊗</sup> <sup>X</sup> : **<sup>V</sup>**<sup>0</sup> <sup>→</sup> **<sup>V</sup>**<sup>0</sup> has a right adjoint [X, <sup>−</sup>] for each <sup>X</sup> <sup>∈</sup> ob**V**0. Similarly, **<sup>V</sup>**<sup>0</sup> is *left closed* if <sup>X</sup> <sup>⊗</sup> (−) has a right adjoint -X, <sup>−</sup> for each <sup>X</sup> <sup>∈</sup> ob**V**0. **<sup>V</sup>**<sup>0</sup> is *biclosed* if **<sup>V</sup>**<sup>0</sup> is left and right closed.

Let **V**<sup>0</sup> <sup>t</sup> denote the monoidal category (**V**<sup>0</sup>, <sup>⊗</sup><sup>t</sup> , I) where <sup>⊗</sup><sup>t</sup> is defined by <sup>X</sup> <sup>⊗</sup><sup>t</sup> <sup>Y</sup> := <sup>Y</sup> <sup>⊗</sup> <sup>X</sup>. Note that **<sup>V</sup>**<sup>0</sup> <sup>t</sup> is right closed if and only if **V**<sup>0</sup> is left closed.

We define **V**0*-category*, **V**0*-functor* and **V**0*-natural transformation* as in [13].

If **V**<sup>0</sup> is right closed, then **V**<sup>0</sup> itself enriches to a **V**0-category **V** with homobject given by **<sup>V</sup>**(X, Y ) := [X, Y ]. We use the subscript (−)<sup>0</sup> to distinguish the enriched category **V** from its underlying category **V**0.

Assume that **V**<sup>0</sup> is biclosed and let **A** be a **V**0-category. The *opposite category* **A**op is the **V**<sup>0</sup> t -category defined by **<sup>A</sup>**op(X, Y ) = **<sup>A</sup>**(Y,X). For any X <sup>∈</sup> ob**A**, **<sup>A</sup>**(X, <sup>−</sup>) : **<sup>A</sup>** <sup>→</sup> **<sup>V</sup>**<sup>0</sup> is a **<sup>V</sup>**0-functor where **<sup>A</sup>**(X, <sup>−</sup>)Y,Z : **<sup>A</sup>**(Y,Z) <sup>→</sup> [**A**(X, Y ), **<sup>A</sup>**(X, Z)] is defined by transposing the composition law ◦ of **<sup>A</sup>**. A **V**<sup>0</sup> t -functor **<sup>A</sup>**(−, X) is defined by **<sup>A</sup>**op(X, <sup>−</sup>) : **<sup>A</sup>**op <sup>→</sup> **<sup>V</sup>**<sup>0</sup> t .

Let **<sup>A</sup>** be a **<sup>V</sup>**0-category. For each <sup>X</sup> <sup>∈</sup> **<sup>V</sup>**<sup>0</sup> and <sup>C</sup> <sup>∈</sup> **<sup>A</sup>**, a *tensor* <sup>X</sup> <sup>⊗</sup> <sup>C</sup> is an object in **<sup>A</sup>** together with a counit morphism ν : X <sup>→</sup> **<sup>A</sup>**(C, X <sup>⊗</sup> C) such that a **<sup>V</sup>**0-natural transformation **<sup>A</sup>**(<sup>X</sup> <sup>⊗</sup> C, <sup>−</sup>) <sup>→</sup> -X, **<sup>A</sup>**(C, <sup>−</sup>) obtained by transposing (◦) ◦ (**A**(X <sup>⊗</sup> C, B) <sup>⊗</sup> ν) is isomorphic where ◦ is the composition in the **<sup>V</sup>**0-category **<sup>A</sup>**. A *cotensor* <sup>X</sup> - C is a tensor in **<sup>A</sup>**op. For example, if **<sup>V</sup>**<sup>0</sup> <sup>=</sup> **Set**, then tensors <sup>X</sup> <sup>⊗</sup> <sup>C</sup> are copowers <sup>X</sup> · <sup>C</sup>, and cotensors <sup>X</sup> - C are powers C<sup>X</sup>.

<sup>A</sup> **<sup>V</sup>**0-functor <sup>F</sup> : **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** is said to preserve a tensor <sup>X</sup> <sup>⊗</sup> <sup>C</sup> if <sup>F</sup>C,X⊗<sup>C</sup> ◦ <sup>ν</sup> : X <sup>→</sup> **<sup>B</sup>**(F C, F(X <sup>⊗</sup> C)) is again a counit morphism. F preserves cotensors if <sup>F</sup>op preserves tensors.

Let Φ be a collection of objects in **<sup>V</sup>**0. A **<sup>V</sup>**0-functor <sup>F</sup> : **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** is said to preserve Φ-(co)tensors if F preserves (co)tensors of the form X <sup>⊗</sup> C (X - C) for each X <sup>∈</sup> Φ and C <sup>∈</sup> ob**A**.

#### **2.2 Graded Monads**

We review the notion of graded monad in [7, 12], and then define the category **GMnd<sup>M</sup>** of finitary **M**-graded monads. Throughout this section, we fix a small strict monoidal category **<sup>M</sup>** = (**M**, <sup>⊗</sup>, I).

**Definition 1 (graded monads).** An **M***-graded monad* on **C** is a lax monoidal functor **<sup>M</sup>** <sup>→</sup> [**C**, **<sup>C</sup>**] where [**C**, **<sup>C</sup>**] is a monoidal category with composition as multiplication. That is, an **<sup>M</sup>**-graded monad is a tuple (∗, η, μ) of a functor <sup>∗</sup> : **<sup>M</sup>** <sup>×</sup> **<sup>C</sup>** <sup>→</sup> **<sup>C</sup>** and natural transformations <sup>η</sup><sup>X</sup> : <sup>X</sup> <sup>→</sup> <sup>I</sup> <sup>∗</sup> <sup>X</sup> and <sup>μ</sup><sup>m</sup>1,m2,X : <sup>m</sup><sup>1</sup> <sup>∗</sup> (m<sup>2</sup> <sup>∗</sup> <sup>X</sup>) <sup>→</sup> (m<sup>1</sup> <sup>⊗</sup> <sup>m</sup><sup>2</sup>) <sup>∗</sup> <sup>X</sup> such that the following diagrams commute.

$$\begin{array}{c} m \ast X \xrightarrow{\eta} \begin{array}{c} I \ast (m \ast X) \\ \downarrow \\ m \ast (I \ast X) \end{array} \xrightarrow{\begin{array}{c} m \ast (m \ast X) \\ \downarrow \\ \mu \end{array}} m \ast X \end{array} \xrightarrow{m\_1 \ast (m\_2 \ast (m\_3 \ast X)) \xrightarrow{m\_1 \ast \mu}} m\_1 \ast ((m\_2 \otimes m\_3) \ast X) \xrightarrow{\begin{array}{c} \mu\_1 \ast \mu\_2 \\ \downarrow \\ \mu\_3 \end{array}} \begin{array}{c} \mu\_1 \ast \mu\_2 \end{array}$$

<sup>A</sup> *morphism of* **<sup>M</sup>***-graded monad* is a monoidal natural transformation α : (∗, η, μ) <sup>→</sup> (∗ , η , μ ), i.e. a natural transformation α : ∗→∗ that is compatible with η and μ.

An intuition of graded monads is a refinement of monads: m <sup>∗</sup> X is a computation whose scope of effect is indicated by m and whose result is in X. The monoidal category **M** defines the granularity of the refinement, and a **1**-graded monad is just an ordinary monad. Note that we do not assume that **M** is symmetric because some of graded monads in [12] require **M** to be nonsymmetric. We also deal with such a nonsymmetric case in Example 25.

A *finitary functor* is a functor that preserves filtered colimits. In this paper, we focus on finitary graded monads on **Set**.

**Definition 2.** A *finitary* **M***-graded monad on* **Set** is a lax monoidal functor **<sup>M</sup>** <sup>→</sup> [**Set**, **Set**]<sup>f</sup> where [**Set**, **Set**]<sup>f</sup> denotes the full subcategory of [**Set**, **Set**] on finitary functors. Let **GMnd<sup>M</sup>** denote the category of finitary **M**-graded monads and monoidal natural transformations between them.

A morphism in **GMnd<sup>M</sup>** is determined by the restriction to ℵ<sup>0</sup> ⊆ **Set** where ℵ<sup>0</sup> is the full subcategory of **Set** on natural numbers.

**Lemma 3.** *Let* T = (∗, η, μ) *and* T = (∗ , η , μ ) *be finitary* **M***-graded monads. There exists one-to-one correspondence between the following.*


n I <sup>∗</sup> n I <sup>∗</sup> <sup>n</sup> η<sup>n</sup> η- n β <sup>m</sup><sup>1</sup> <sup>∗</sup> n m<sup>1</sup> <sup>∗</sup> <sup>n</sup> <sup>m</sup><sup>1</sup> <sup>∗</sup> (m<sup>2</sup> <sup>∗</sup> <sup>n</sup> ) <sup>m</sup><sup>1</sup> <sup>∗</sup> (m<sup>2</sup> <sup>∗</sup> <sup>n</sup> ) <sup>m</sup><sup>1</sup> <sup>∗</sup> (m<sup>2</sup> <sup>∗</sup> <sup>n</sup> ) (m<sup>1</sup> <sup>⊗</sup> <sup>m</sup><sup>2</sup>) <sup>∗</sup> <sup>n</sup> (m<sup>1</sup> <sup>⊗</sup> <sup>m</sup><sup>2</sup>) <sup>∗</sup> <sup>n</sup> β m1∗f m1∗- f m1∗- β μ μ- β

*Proof.* By the equivalence [**Set**, **Set**]<sup>f</sup> [ℵ<sup>0</sup>, **Set**] induced by restriction and the left Kan extension along the inclusion <sup>i</sup> : <sup>ℵ</sup><sup>0</sup> <sup>→</sup> **Set**. 

#### **2.3 Day Convolution**

We describe a monoidal biclosed structure on the (covariant) presheaf category [**M**, **Set**]<sup>0</sup> where **<sup>M</sup>** = (**M**, <sup>⊗</sup>, I) is a small monoidal category [3]. Here, we use the subscript (−)<sup>0</sup> to indicate that [**M**, **Set**]<sup>0</sup> is an ordinary (not enriched) category since we also use the enriched version [**M**, **Set**] later.

The *external tensor product* F G : **<sup>M</sup>** <sup>×</sup> **<sup>M</sup>** <sup>→</sup> **Set** is defined by (F <sup>G</sup>)(m<sup>1</sup>, m<sup>2</sup>) = Fm<sup>1</sup> <sup>×</sup> Gm<sup>2</sup> for any F, G : **<sup>M</sup>** <sup>→</sup> **Set**.

**Definition 4.** Let F, G : **<sup>M</sup>** <sup>→</sup> **Set** be functors. The *Day tensor product* F <sup>⊗</sup><sup>ˇ</sup> <sup>G</sup> : **<sup>M</sup>** <sup>→</sup> **Set** is the left Kan extension Lan⊗(F G) of the external tensor product F G : **<sup>M</sup>** <sup>×</sup> **<sup>M</sup>** <sup>→</sup> **Set** along the tensor product <sup>⊗</sup> : **<sup>M</sup>** <sup>×</sup> **<sup>M</sup>** <sup>→</sup> **<sup>M</sup>**.

Note that a natural transformation θ : F <sup>⊗</sup><sup>ˇ</sup> G <sup>→</sup> H is equivalent to a natural transformation <sup>θ</sup><sup>m</sup>1,m<sup>2</sup> : Fm<sup>1</sup> <sup>×</sup> Gm<sup>2</sup> <sup>→</sup> <sup>H</sup>(m<sup>1</sup> <sup>⊗</sup> <sup>m</sup><sup>2</sup>) by the universal property.

The Day convolution induces a monoidal biclosed structure in [**M**, **Set**]<sup>0</sup> [3]. **Proposition 5.** *The Day tensor product makes* ([**M**, **Set**]<sup>0</sup>, <sup>⊗</sup><sup>ˇ</sup> , y(I)) *a monoidal biclosed category where* <sup>y</sup> : **<sup>M</sup>**op <sup>→</sup> [**M**, **Set**]<sup>0</sup> *is the Yoneda embedding* <sup>y</sup>(m) := **<sup>M</sup>**(m, <sup>−</sup>)*.* 

The left and the right closed structure are given by -F, G <sup>m</sup> = [**M**, **Set**]0(F, G(m⊗−)) and [F, G] m = [**M**, **Set**]0(F, G(−⊗m)) for each <sup>m</sup> <sup>∈</sup> **<sup>M</sup>**, respectively.

Note that since we do not assume **<sup>M</sup>** to be symmetric, neither is [**M**, **Set**]0. Note also that the twisting and the above construction commute: there is an isomorphism [**M**, **Set**]<sup>0</sup> <sup>t</sup> ∼= [**M**<sup>t</sup> , **Set**]<sup>0</sup> of monoidal categories.

#### **2.4 Categories Enriched in a Presheaf Category**

We rephrase the definitions of [**M**, **Set**]0-enriched category, functor and natural transformation in elementary terms. An [**M**, **Set**]0-category is, so to say, an "**M**graded" category: each morphism has a grade m <sup>∈</sup> ob**<sup>M</sup>** and the grade of the composite of two morphisms with grades <sup>m</sup> and <sup>m</sup> is the product m <sup>⊗</sup> m of the grades of each morphism. Likewise, [**M**, **Set**]0-functors and [**M**, **Set**]0-natural transformations can be also understood as an "**M**-graded" version of ordinary functors and natural transformations. Specifically, the following lemma holds [2].

**Lemma 6.** *There is a one-to-one correspondence between (1) an* [**M**, **Set**]0 *category* **C** *and (2) the following data satisfying the following conditions.*


*These data must satisfy the identity law* <sup>1</sup><sup>Y</sup> ◦ <sup>f</sup> <sup>=</sup> <sup>f</sup> <sup>=</sup> <sup>f</sup> ◦ <sup>1</sup><sup>X</sup> *for each* f <sup>∈</sup> **<sup>C</sup>**(X, Y )m *and the associativity* (h ◦ g) ◦ f <sup>=</sup> h ◦ (g ◦ f) *for each* <sup>f</sup> <sup>∈</sup> **<sup>C</sup>**(X, Y )m<sup>1</sup>*,* <sup>g</sup> <sup>∈</sup> **<sup>C</sup>**(Y,Z)m<sup>2</sup> *and* <sup>h</sup> <sup>∈</sup> **<sup>C</sup>**(Z,W)m<sup>3</sup>*.*

*Proof.* The identity <sup>1</sup><sup>X</sup> : <sup>y</sup>(I) <sup>→</sup> **<sup>C</sup>**(X, X) in **<sup>C</sup>** corresponds to 1<sup>X</sup> <sup>∈</sup> **<sup>C</sup>**(X, X)<sup>I</sup> by the Yoneda lemma, and the composition ◦ : **<sup>C</sup>**(Y,Z) <sup>⊗</sup><sup>ˇ</sup> **<sup>C</sup>**(X, Y ) <sup>→</sup> **<sup>C</sup>**(X, Z) in **<sup>C</sup>** corresponds to the natural transformation ◦<sup>m</sup>1,m<sup>2</sup> : **<sup>C</sup>**(Y,Z)m<sup>1</sup>×**C**(X, Y )m<sup>2</sup> <sup>→</sup> **<sup>C</sup>**(X, Z)(m<sup>1</sup> <sup>⊗</sup> <sup>m</sup>2) by the universal property of the Day convolution. The rest of the proof is easy. 

An [**M**, **Set**]0-functor <sup>F</sup> : **<sup>C</sup>** <sup>→</sup> **<sup>D</sup>** consists of a mapping <sup>X</sup> <sup>→</sup> F X and a natural transformation <sup>F</sup>X,Y : **<sup>C</sup>**(X, Y ) <sup>→</sup> **<sup>D</sup>**(FX, F Y ) (for each X, Y ) that preserves identities and compositions of morphisms. An [**M**, **Set**]0-natural transformation α : F <sup>→</sup> G is a family of elements - <sup>α</sup><sup>X</sup> <sup>∈</sup> **<sup>D</sup>**(F X, GX)<sup>I</sup> <sup>X</sup>∈ob(C) that satisfies <sup>α</sup><sup>Y</sup> ◦ F f <sup>=</sup> Gf ◦ <sup>α</sup><sup>X</sup> for each <sup>f</sup> <sup>∈</sup> **<sup>C</sup>**(X, Y )m. Vertical and horizontal compositions of [**M**, **Set**]0-natural transformations are defined as expected.

We introduce a useful construction of [**M**, **Set**]<sup>0</sup> t -categories. Given an **M**graded monad (in other words, a lax left **M**-action) on **C**, we can define an [**M**, **Set**]<sup>0</sup> t -enriched category as follows.

**Definition 7.** Let <sup>T</sup> = (∗, η, μ) be an **<sup>M</sup>**-graded monad on **<sup>C</sup>**. An [**M**, **Set**]<sup>0</sup> t category **C** <sup>T</sup> is defined by ob**<sup>C</sup>** <sup>T</sup> := ob**<sup>C</sup>** and **<sup>C</sup>** <sup>T</sup> (X, Y )<sup>m</sup> := **<sup>C</sup>**(X,m <sup>∗</sup> <sup>Y</sup> ). The identity morphisms are the unit morphisms <sup>η</sup><sup>X</sup> <sup>∈</sup> **<sup>C</sup>** <sup>T</sup> (X, X)I, and the composite of f <sup>∈</sup> **<sup>C</sup>** <sup>T</sup> (Y,Z)<sup>m</sup> and <sup>g</sup> <sup>∈</sup> **<sup>C</sup>** <sup>T</sup> (X, Y )m is <sup>μ</sup> ◦ (<sup>m</sup> <sup>∗</sup> <sup>g</sup>) ◦ <sup>f</sup>.

The definition of **C** <sup>T</sup> is similar to the definition of the Kleisli categories for ordinary monads. Actually, **C** <sup>T</sup> can be constructed via the Kleisli category **<sup>C</sup>**<sup>T</sup> for the graded monad <sup>T</sup> presented in [7] (although **<sup>C</sup>**<sup>T</sup> itself is not enriched). This can be observed by **<sup>C</sup>**<sup>T</sup> ((I,X),(m, Y )) <sup>∼</sup><sup>=</sup> **<sup>C</sup>** <sup>T</sup> (X, Y )m.

## **3 Graded Algebraic Theories**

We explain a framework of universal algebra for graded monads, which is a natural extension of [17, 27]. The key idea of this framework is that each term is associated with not only an arity but also a "grade", which is represented by an object in a monoidal category **M**. We also add coercion construct for terms that changes the grade of terms along a morphism of the monoidal category **M**. Then, a mapping that takes m <sup>∈</sup> **<sup>M</sup>** and a set of variables X and returns the set

of terms with grade m (modulo the equational axioms) yields a graded monad. We fix a small strict monoidal category **<sup>M</sup>** = (**M**, <sup>⊗</sup>, I) throughout this section. We sometimes identify <sup>n</sup> <sup>∈</sup> <sup>N</sup> with {1,...,n}, or {x1,...,x<sup>n</sup>} if it is used as a set of variables.

#### **3.1 Equational Logic**

<sup>A</sup> *signature* is a family of sets of symbols <sup>Σ</sup> = (Σn,m)<sup>n</sup>∈N,m∈**<sup>M</sup>**. An element <sup>f</sup> <sup>∈</sup> <sup>Σ</sup>n,m is called an operation with arity <sup>n</sup> and grade <sup>m</sup>. We define a sufficient structure to interpret operations in a category **C** as follows.

**Definition 8. M***-model condition* is defined by the following conditions on a tuple (**C**,(, η-, μ-)).


**Example 9.** If **A** is a category with finite powers, then the functor category [**M**, **<sup>A</sup>**] has strong **<sup>M</sup>**<sup>t</sup> -action defined by m F := F(m <sup>⊗</sup> (−)) and satisfies **<sup>M</sup>**-model condition. Especially, [**M**, **Set**]<sup>0</sup> satisfies **<sup>M</sup>**-model condition.

<sup>A</sup> *model* <sup>A</sup> = (A, |·|<sup>A</sup>) of Σ in a category **<sup>C</sup>** satisfying **<sup>M</sup>**-model condition consists of an object A <sup>∈</sup> **<sup>C</sup>** and an interpretation <sup>|</sup>f<sup>|</sup> <sup>A</sup> : A<sup>n</sup> <sup>→</sup> m A for each f <sup>∈</sup> Σn,m. A *homomorphism* <sup>α</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> between two models A, B is a morphism α : A <sup>→</sup> B in **<sup>C</sup>** such that (m α) ◦ |f<sup>|</sup> <sup>A</sup> <sup>=</sup> <sup>|</sup>f<sup>|</sup> <sup>B</sup> ◦ α<sup>n</sup> for each f <sup>∈</sup> Σn,m.

**Definition 10.** Let X be a set of variables. The set of (**M**-graded) Σ-terms T Σ <sup>m</sup> (X) for each <sup>m</sup> <sup>∈</sup> **<sup>M</sup>** is defined inductively as follows.

$$\frac{x \in X}{x \in T\_I^{\Sigma}(X)} \quad \frac{t \in T\_m^{\Sigma}(X)}{c\_w(t) \in T\_{m'}^{\Sigma}(X)} \quad \frac{f \in \Sigma\_{n,m} \qquad \forall i \in \{1, \ldots, n\}, \ t\_i \in T\_{m'}^{\Sigma}(X)}{f(t\_1, \ldots, t\_n) \in T\_{m \otimes m'}^{\Sigma}(X)}$$

That is, we build Σ-terms from variables by applying operations in Σ and coercions <sup>c</sup><sup>w</sup> while keeping track of the grade of terms. When applying operations, we sometimes write f(λi <sup>∈</sup> n.t<sup>i</sup>) or <sup>f</sup>(λi.t<sup>i</sup>) instead of <sup>f</sup>(t<sup>1</sup>,...,t<sup>n</sup>).

**Definition 11.** Let A be a model of a signature Σ. For each m <sup>∈</sup> **<sup>M</sup>** and s <sup>∈</sup> T <sup>Σ</sup> <sup>m</sup> (n), the *interpretation* <sup>|</sup>s<sup>|</sup> <sup>A</sup> : A<sup>n</sup> <sup>→</sup> m A is defined as follows.


When we interpret a term t <sup>∈</sup> T <sup>Σ</sup> <sup>m</sup> (X), we need to pick a finite set <sup>n</sup> such that fv(t) <sup>⊆</sup> n <sup>⊆</sup> X where fv(t) is the set of free variables in t, but the choice of the finite set does not matter when we consider only equality of interpretations by the following fact. If <sup>σ</sup> : <sup>n</sup> <sup>→</sup> <sup>n</sup> is a renaming of variables and σ : T <sup>Σ</sup> <sup>m</sup> (n) <sup>→</sup> T Σ <sup>m</sup> (n ) is a mapping induced by the renaming σ, then for each t <sup>∈</sup> T <sup>Σ</sup> <sup>m</sup> (n), <sup>|</sup>σ(t)<sup>|</sup> <sup>A</sup> <sup>=</sup> <sup>|</sup>t<sup>|</sup> <sup>A</sup> ◦ A<sup>σ</sup>, which implies that equality of the interpretations of two terms s, t is preserved by renaming: <sup>|</sup>s<sup>|</sup> <sup>=</sup> <sup>|</sup>t<sup>|</sup> implies <sup>|</sup>σ(s)<sup>|</sup> <sup>=</sup> <sup>|</sup>σ(s)|.

An *equational axiom* is a family of sets <sup>E</sup> = (E<sup>m</sup>)<sup>m</sup>∈**<sup>M</sup>** where <sup>E</sup><sup>m</sup> is a set of pairs of terms in T <sup>Σ</sup> <sup>m</sup> (X). We sometimes identify <sup>E</sup> with its union <sup>m</sup>∈**<sup>M</sup>** <sup>E</sup><sup>m</sup>. A *presentation of an* **M***-graded algebraic theory* (or an **M***-graded algebraic theory*) is a pair <sup>T</sup> = (Σ,E) of a signature and an equational axiom. A *model* A of (Σ,E) is a model of Σ that satisfies <sup>|</sup>s<sup>|</sup> <sup>A</sup> <sup>=</sup> <sup>|</sup>t<sup>|</sup> <sup>A</sup> for each (<sup>s</sup> <sup>=</sup> <sup>t</sup>) <sup>∈</sup> <sup>E</sup>. Let Mod<sup>T</sup> (**C**) denote the category of models of T in **C** and homomorphisms between them.

To obtain a graded monad on **Set** from T , we need a strict left action of **<sup>M</sup>** on Mod<sup>T</sup> ([**M**, **Set**]0) and an adjunction between Mod<sup>T</sup> ([**M**, **Set**]0) and **Set**. The former is defined by the following, while the latter is described in §3.2.

**Lemma 12.** *Let* **C** *be a category satisfying* **M**1×**M**2*-model condition. If* T *is an* **M**1*-graded algebraic theory, then* **C** *satisfies* **M**1*-model condition and* Mod<sup>T</sup> (**C**) *satisfies* **M**2*-model condition.*

*Proof.* An **M**<sup>t</sup> <sup>1</sup>-action on **C** is obtained by the composition of **M**<sup>t</sup> <sup>1</sup> <sup>×</sup> **<sup>M</sup>**<sup>t</sup> <sup>2</sup>-action and the strong monoidal functor **M**<sup>t</sup> <sup>1</sup> <sup>→</sup> **<sup>M</sup>**<sup>t</sup> <sup>1</sup> <sup>×</sup>**M**<sup>t</sup> <sup>2</sup> defined by <sup>m</sup> <sup>→</sup> (m, I). Finite powers and an **M**<sup>t</sup> <sup>2</sup>-action for Mod<sup>T</sup> (**C**) are induced by those for **C**. 

**Corollary 13.** Mod<sup>T</sup> ([**M**, **Set**]0) *has an* **<sup>M</sup>***-action, which is given by the precomposition of* m <sup>⊗</sup> (−) *like the* **<sup>M</sup>***-action of Example 9.*

*Proof.* [**M**, **Set**]<sup>0</sup> has **<sup>M</sup>**<sup>t</sup> <sup>×</sup> **<sup>M</sup>**-action defined by (m<sup>1</sup>, m<sup>2</sup>) <sup>∗</sup> <sup>F</sup> <sup>=</sup> <sup>F</sup>(m<sup>1</sup> <sup>⊗</sup> (−) <sup>⊗</sup> <sup>m</sup><sup>2</sup>). Thus, **<sup>M</sup>**-action for Mod<sup>T</sup> ([**M**, **Set**]0) is obtained by Lemma 12. 

Substitution s[t<sup>1</sup>/x<sup>1</sup>,...,t<sup>k</sup>/x<sup>k</sup>] for **<sup>M</sup>**-graded <sup>Σ</sup>-terms can be defined as usual, but we have to take care of grades: given s <sup>∈</sup> T <sup>Σ</sup> <sup>m</sup> (k) and <sup>t</sup><sup>1</sup>,...,t<sup>k</sup> <sup>∈</sup> T Σ m- (n), the substitution <sup>s</sup>[t<sup>1</sup>/x<sup>1</sup>,...,t<sup>k</sup>/x<sup>k</sup>] is defined as a term in <sup>T</sup> <sup>Σ</sup> m⊗m-(n).

We obtain an equational logic for graded theories by adding some additional rules to the usual equational logic.

**Definition 14.** The entailment relation T <sup>s</sup> <sup>=</sup> <sup>t</sup> (where s, t <sup>∈</sup> <sup>T</sup><sup>m</sup>(X)) for an **M**-graded theory T is defined by adding the following rules to the standard rules i.e. reflexivity, symmetry, transitivity, congruence, substitution and axiom in E (see e.g. [26] for the standard rules of equational logic).

$$\begin{array}{cc} s,t \in T\_{m}^{\Sigma}(X) & \mathcal{T} \vdash s=t & w:m \to m' & t \in T\_{m}^{\Sigma}(X) \\ \hline & \mathcal{T} \vdash c\_{w}(s) = c\_{w}(t) & \frac{t \in T\_{m}^{\Sigma}(X)}{\mathcal{T} \vdash c\_{1m}(t) = t} \\ & & \frac{t \in T\_{m}^{\Sigma}(X) & w:m \to m' & w':m' \to m''}{\mathcal{T} \vdash c\_{w'}(c\_{w}(t)) = c\_{w' \circ w}(t)} \\ \end{array}$$
 
$$\begin{array}{cc} f \in \Sigma\_{n,m} & t\_{i} \in T\_{m'}^{\Sigma}(X) \text{ for each } i \in \{1, \ldots, n\} & w:m' \to m'' \\ \hline & \mathcal{T} \vdash f(c\_{w}(t\_{1}), \ldots, c\_{w}(t\_{n})) = c\_{m \otimes w}(f(t\_{1}, \ldots, t\_{n})) \end{array}$$

**Definition 15.** Given a model A of <sup>T</sup> , we denote A s <sup>=</sup> t if s, t <sup>∈</sup> T <sup>Σ</sup> <sup>m</sup> (n) (for some n) and <sup>|</sup>s<sup>|</sup> <sup>A</sup> <sup>=</sup> <sup>|</sup>t<sup>|</sup> <sup>A</sup>. If **C** is a category satisfying **M**-model condition, we denote <sup>T</sup> , **<sup>C</sup>** s <sup>=</sup> t if A s <sup>=</sup> t for any model A of <sup>T</sup> in **<sup>C</sup>**.

It is easy to verify that the equational logic in Definition 14 is sound.

**Theorem 1 (soundness).** T s <sup>=</sup> t *implies* <sup>T</sup> , **<sup>C</sup>** s <sup>=</sup> t*.* 

#### **3.2 Free Models**

We describe a construction of a free model <sup>F</sup> <sup>T</sup> <sup>X</sup> <sup>∈</sup> Mod<sup>T</sup> ([**M**, **Set**]0) of a graded theory <sup>T</sup> generated by a set X, which induces an adjunction between Mod<sup>T</sup> ([**M**, **Set**]0) and **Set**. This adjunction, together with the **<sup>M</sup>**-action of Corollary 13, gives a graded monad as described in [7].

**Definition 16 (free model** F <sup>T</sup> <sup>X</sup>**).** Let <sup>T</sup> = (Σ,E) be an **<sup>M</sup>**-graded theory. We define a functor F <sup>T</sup> <sup>X</sup> : **<sup>M</sup>** <sup>→</sup> **Set** by <sup>F</sup> <sup>T</sup> Xm := <sup>T</sup> <sup>Σ</sup> <sup>m</sup> (X)/∼<sup>m</sup> for each <sup>m</sup> <sup>∈</sup> **<sup>M</sup>** and any <sup>X</sup> <sup>∈</sup> **Set** where <sup>s</sup> <sup>∼</sup><sup>m</sup> <sup>t</sup> is the equivalence relation defined by T <sup>s</sup> <sup>=</sup> <sup>t</sup> and F <sup>T</sup> Xw([t]m) := [c<sup>w</sup>(t)]<sup>m</sup> for any <sup>w</sup> : <sup>m</sup> <sup>→</sup> <sup>m</sup> where [t]<sup>m</sup> is the equivalence class of t <sup>∈</sup> T <sup>Σ</sup> <sup>m</sup> (X). For each <sup>f</sup> <sup>∈</sup> <sup>Σ</sup>n,m- , let <sup>|</sup>f<sup>|</sup> <sup>F</sup> <sup>T</sup> <sup>X</sup> : (F <sup>T</sup> X)<sup>n</sup> <sup>→</sup> m F <sup>T</sup> X be a mapping defined by <sup>|</sup>f<sup>|</sup> <sup>F</sup> <sup>T</sup> <sup>X</sup> <sup>m</sup> ([t1]<sup>m</sup>,..., [t<sup>n</sup>]m)=[f(t1,...,t<sup>n</sup>)]<sup>m</sup>-<sup>⊗</sup><sup>m</sup> for each <sup>m</sup> <sup>∈</sup> **<sup>M</sup>**. We define a model of <sup>T</sup> by <sup>F</sup> <sup>T</sup> X = (F <sup>T</sup> X, |·|<sup>F</sup> <sup>T</sup> <sup>X</sup>).

The model <sup>F</sup> <sup>T</sup> <sup>X</sup>, together with the mapping <sup>η</sup><sup>X</sup> : <sup>X</sup> <sup>→</sup> <sup>F</sup> <sup>T</sup> XI defined by <sup>x</sup> <sup>→</sup> [x]<sup>I</sup> , has the following universal property as a free model generated by <sup>X</sup>.

**Lemma 17.** *For any model* <sup>A</sup> *in* [**M**, **Set**]<sup>0</sup> *and any mapping* <sup>v</sup> : <sup>X</sup> <sup>→</sup> AI*, there exists a unique homomorphism* <sup>v</sup> : <sup>F</sup> <sup>T</sup> <sup>X</sup> <sup>→</sup> <sup>A</sup> *satisfying* <sup>v</sup><sup>I</sup> ◦ <sup>η</sup><sup>X</sup> <sup>=</sup> <sup>v</sup>*.* 

**Corollary 18.** *Let* <sup>U</sup> : Mod<sup>T</sup> ([**M**, **Set**]0) <sup>→</sup> **Set** *be the forgetful functor defined by the evaluation at* <sup>I</sup>*, that is,* UA <sup>=</sup> <sup>A</sup><sup>I</sup> *and* Uα <sup>=</sup> <sup>α</sup><sup>I</sup> *. The free model functor* <sup>F</sup> <sup>T</sup> : **Set** <sup>→</sup> Mod<sup>T</sup> ([**M**, **Set**]0) *is a left adjoint of* <sup>U</sup>*.* 

By considering the interpretation in the free model, we obtain the following completeness theorem.

#### **Theorem 19 (completeness).** <sup>T</sup> , [**M**, **Set**]<sup>0</sup> <sup>s</sup> <sup>=</sup> <sup>t</sup> *implies* T <sup>s</sup> <sup>=</sup> <sup>t</sup>*.*

Recall that Mod<sup>T</sup> ([**M**, **Set**]0) has a left action (Corollary 13). Therefore the above adjunction induces an **M**-graded monad as described in [7].

The relationship between Mod<sup>T</sup> ([**M**, **Set**]0) and the Eilenberg–Moore construction is as follows. In [7], the Eilenberg–Moore category **C<sup>T</sup>** for any graded monad **<sup>T</sup>** on **<sup>C</sup>** is introduced together with a left action : **<sup>M</sup>** <sup>×</sup> **<sup>C</sup><sup>T</sup>** <sup>→</sup> **<sup>C</sup>T**. If **C** = **Set** and **T** is the graded monad obtained from an **M**-graded theory T , then the Eilenberg–Moore category **Set<sup>T</sup>** is essentially the same as Mod<sup>T</sup> ([**M**, **Set**]0).

**Theorem 20.** *The comparison functor* <sup>K</sup> : Mod<sup>T</sup> ([**M**, **Set**]0) <sup>→</sup> **Set<sup>T</sup>** *(see [7] for the definition) where* T *is an* **M***-graded theory and* **T** *is the graded monad induced from the graded theory* <sup>T</sup> *is isomorphic. Moreover,* K *preserves the* **<sup>M</sup>***action:* ◦ (**<sup>M</sup>** <sup>×</sup> K) = K ◦ *.* 

We define the category **GS<sup>M</sup>** of graded algebraic theories as follows.

**Definition 21.** Let <sup>T</sup> = (Σ,E) and <sup>T</sup> = (Σ , E ). A morphism α : T →T between graded algebraic theories is a family of mappings <sup>α</sup>n,m : <sup>Σ</sup>n,m <sup>→</sup> <sup>F</sup> <sup>T</sup> - nm from operations in Σ to Σ -terms such that the equations in E are preserved by α, i.e. for each s, t <sup>∈</sup> T <sup>Σ</sup> <sup>m</sup> (X), (s, t) <sup>∈</sup> <sup>E</sup> implies <sup>|</sup>s<sup>|</sup> (F <sup>T</sup> - X,α) <sup>=</sup> <sup>|</sup>t<sup>|</sup> (F <sup>T</sup> - X,α) where (F <sup>T</sup> - X, α) is a model of <sup>T</sup> induced by α.

**Definition 22.** Given a morphism α : T →T , let F <sup>α</sup> : F <sup>T</sup> <sup>→</sup> F <sup>T</sup> - be a natural transformation defined by F <sup>α</sup>([t]) = <sup>|</sup>t<sup>|</sup> (F <sup>T</sup> - X,α) for each t <sup>∈</sup> T <sup>Σ</sup> <sup>m</sup> (X).

**Definition 23.** We write **GS**<sup>M</sup> for the category of graded algebraic theories and morphisms between them. The identity morphisms are defined by 1<sup>T</sup> (f) = [f(x1,...,x<sup>n</sup>)] for each f <sup>∈</sup> Σn,m. The composition of α : T →T and β : <sup>T</sup> <sup>→</sup> <sup>T</sup> is defined by β ◦ α(f) = F <sup>β</sup>(α(f)).

#### **3.3 Examples**

**Example 24 (graded modules).** Let **<sup>M</sup>** = (N, <sup>+</sup>, 0) where <sup>N</sup> is regarded as a discrete category. Given a graded ring A <sup>=</sup> <sup>n</sup>∈<sup>N</sup> <sup>A</sup><sup>n</sup>, let <sup>Σ</sup> be a set of operations which consists of the binary addition operation + (arity: 2, grade: 0), the unary inverse operation − (arity: 1, grade: 0), the identity element (nullary operation) 0 (arity: 0, grade: 0) and the unary scalar multiplication operation a ·(−) (arity: 1, grade: <sup>n</sup>) for each <sup>a</sup> <sup>∈</sup> <sup>A</sup><sup>n</sup>. Let E be the equational axiom for modules.

A model (F, |·|) of the **<sup>M</sup>**-graded theory (Σ,E) in [**M**, **Set**]<sup>0</sup> consists of a set <sup>F</sup><sup>n</sup> for each <sup>n</sup> <sup>∈</sup> <sup>N</sup> and functions <sup>|</sup>+|<sup>n</sup> : (F<sup>n</sup>)<sup>2</sup> <sup>→</sup> <sup>F</sup><sup>n</sup>, |−|<sup>n</sup> : <sup>F</sup><sup>n</sup> <sup>→</sup> <sup>F</sup><sup>n</sup>, <sup>|</sup>0|<sup>n</sup> <sup>∈</sup> <sup>F</sup><sup>n</sup> and <sup>|</sup><sup>a</sup> · (−)|<sup>n</sup> : <sup>F</sup><sup>n</sup> <sup>→</sup> <sup>F</sup><sup>m</sup>+<sup>n</sup> for each <sup>n</sup> <sup>∈</sup> <sup>N</sup> and each <sup>a</sup> <sup>∈</sup> <sup>A</sup><sup>m</sup>, and these interpretations satisfy <sup>E</sup>. Therefore models of (Σ,E) in [**M**, **Set**]<sup>0</sup> correspond one-to-one with graded modules.

**Example 25 (graded exception monad [12, Example 3.4]).** We give an algebraic presentation of the graded exception monad.

Let **<sup>M</sup>** and (∗, η, μ) be a preordered monoid and the graded monad defined as follows. Let P <sup>+</sup>(X) denote the set of nonempty subsets of <sup>X</sup>. Let Ex be a set of exceptions and **<sup>M</sup>** = ((P <sup>+</sup>(Ex ∪ {Ok}), <sup>⊆</sup>),I, <sup>⊗</sup>) be a preordered monoid where I <sup>=</sup> {Ok} and the multiplication <sup>⊗</sup> is defined by m <sup>⊗</sup> m = (<sup>m</sup> \ {Ok}) <sup>∪</sup> <sup>m</sup> if Ok <sup>∈</sup> m and m <sup>⊗</sup> m <sup>=</sup> <sup>m</sup> otherwise (note that this is not commutative). The graded exception monad (∗, η, μ) is the **<sup>M</sup>**-graded monad given as follows.

$$\begin{aligned} m \ast X &= \{ \text{Er}(e) \mid e \in m \mid \{ \text{Ok} \} \} \cup \{ \text{Ok}(x) \mid x \in X \land \text{Ok} \in m \} \\ \eta\_X(x) &= \text{Ok}(x) \qquad \mu\_{m\_1, m\_2, X}(\text{Er}(e)) = \text{Er}(e) \quad \mu\_{m\_1, m\_2, X}(\text{Ok}(x)) = x \end{aligned}$$

The **<sup>M</sup>**-graded theory <sup>T</sup> ex for the graded exception monad is defined by (Σex, <sup>∅</sup>) where <sup>Σ</sup>ex is the set that consists of an operation raise<sup>e</sup> (arity: 0, grade: {e}) for each e <sup>∈</sup> Ex.

The graded monad induced by <sup>T</sup> ex coincides with the graded exception monad. Indeed, the free model functor <sup>F</sup> <sup>T</sup> ex for <sup>T</sup> ex is given by F <sup>T</sup> exXm <sup>=</sup> <sup>m</sup> <sup>∗</sup> <sup>X</sup>. Here, the operations raise<sup>e</sup> are interpreted by <sup>e</sup> <sup>∈</sup> Ex.

$$|\mathsf{raise}\_e|\_m^{F^{\mathcal{T}^{\mathrm{aux}}}X} = \mathrm{Er}(e) \in F^{\mathcal{T}^{\mathrm{eu}}}X(\{e\} \otimes m),$$

**Example 26 (extending an ordinary monad to an M-graded monad).** We consider the problem of extending an **M** -graded theory to an **M**-graded theory along a lax monoidal functor of type **M** → **M**, but here we restrict ourselves to the case of **M** = **1** and the strict monoidal functor of type **1** → **M**.

Let **<sup>M</sup>** = (**M**,I, <sup>⊗</sup>) be an arbitrary small strict monoidal category. Let <sup>T</sup> <sup>=</sup> (Σ,E) be a (**1**-graded) theory and (T,η<sup>T</sup> , μ<sup>T</sup> ) be the corresponding ordinary monad. Let <sup>T</sup> **<sup>M</sup>** = (Σ**<sup>M</sup>**, E**<sup>M</sup>**) be the **<sup>M</sup>**-graded theory obtained when we regard each operation in <sup>T</sup> as an operation with grade <sup>I</sup> <sup>∈</sup> **<sup>M</sup>**, that is, <sup>Σ</sup>**<sup>M</sup>**n,m := <sup>Σ</sup><sup>n</sup> if <sup>m</sup> <sup>=</sup> <sup>I</sup> and <sup>Σ</sup>**<sup>M</sup>**n,m := <sup>∅</sup> otherwise, and <sup>E</sup>**<sup>M</sup>** := <sup>E</sup>.

The free model functor for <sup>T</sup> **<sup>M</sup>** is <sup>F</sup> <sup>T</sup> **<sup>M</sup>** X <sup>=</sup> F <sup>T</sup> (**M**(I, <sup>−</sup>) <sup>×</sup> X) where F <sup>T</sup> : **Set** → Mod<sup>T</sup> (**Set**) is the free model functor for T as a **1**-graded theory, and the interpretation of an operation <sup>f</sup> <sup>∈</sup> <sup>Σ</sup><sup>n</sup> in <sup>F</sup> <sup>T</sup> **<sup>M</sup>** X is defined by the interpretation in the free models of T .

$$\left|f\right|\_{m}^{F^{\mathcal{T}^{\mathbf{M}}}X} = \left|f\right|^{F^{\mathcal{T}}(\mathbf{M}(I,m)\times X)} : \left(F^{\mathcal{T}}(\mathbf{M}(I,m)\times X)\right)^{n} \to F^{\mathcal{T}}(\mathbf{M}(I,m)\times X) :$$

Intuitively, this can be understood as follows. Since all the operations are of grade <sup>I</sup>, coercions <sup>c</sup><sup>w</sup> in a term can be moved to the innermost places where variables occur by repeatedly applying <sup>c</sup><sup>w</sup>(f(t<sup>1</sup>,...,t<sup>n</sup>)) = f(c<sup>w</sup>(t<sup>1</sup>),...,c<sup>w</sup>(t<sup>n</sup>)) (see Definition 14). Therefore, we can consider terms of <sup>T</sup> **<sup>M</sup>** as terms of <sup>T</sup> whose variables are of the form c<sup>w</sup>(x).

An **<sup>M</sup>**-graded monad (∗, η, μ) obtained from <sup>T</sup> **<sup>M</sup>** is as follows.

$$m \ast X = T(\mathbf{M}(I, m) \times X) \qquad \eta = \eta^T(1\_I, -) \qquad \mu = T(\otimes \times X) \circ \mu^T \circ T \circ \mu$$

Here, <sup>⊗</sup> : **<sup>M</sup>**(I,m<sup>1</sup>) <sup>×</sup> **<sup>M</sup>**(I,m<sup>2</sup>) <sup>→</sup> **<sup>M</sup>**(I,m<sup>1</sup> <sup>⊗</sup> <sup>m</sup><sup>2</sup>) is induced by <sup>⊗</sup>: **<sup>M</sup>** <sup>×</sup> **<sup>M</sup>** <sup>→</sup> **<sup>M</sup>** and stX,Y : <sup>X</sup> <sup>×</sup> T Y <sup>→</sup> <sup>T</sup>(<sup>X</sup> <sup>×</sup> <sup>Y</sup> ) is the strength for <sup>T</sup>.

## **4 Graded Lawvere Theories**

We present a categorical formulation of graded algebraic theories of §3 in a similar fashion to ordinary Lawvere theories.

For ordinary (single-sorted) finitary algebraic theories, a *Lawvere theory* is defined as a small category **L** with finite products together with a strict finiteproduct preserving identity-on-objects functor J : <sup>ℵ</sup>op <sup>0</sup> → **L** where ℵ<sup>0</sup> is the full subcategory of **Set** on natural numbers. Intuitively, morphisms in the Lawvere theory **L** are terms of the corresponding algebraic theory, and objects of **L**, which are exactly the objects in obℵ0, are arities.

According to the above intuition, it is expected that a graded Lawvere theory is also defined as a category whose objects are natural numbers and morphisms are graded terms. However, since terms in a graded algebraic theory are stratified by a monoidal category **M**, mere sets are insufficient to express hom-objects of graded Lawvere theories. Instead, we take hom-objects from the functor category [**M**, **Set**]<sup>0</sup> and define graded Lawvere theories using [**M**, **Set**]0-categories where [**M**, **Set**]<sup>0</sup> is equipped with the Day convolution monoidal structure. Specifically, <sup>ℵ</sup><sup>0</sup> (in ordinary Lawvere theories) is replaced with an [**M**, **Set**]0-category **<sup>N</sup>M**, **<sup>L</sup>** with an [**M**, **Set**]0-category, and "finite products" with "**N**op **<sup>M</sup>**-cotensors".

So, we first provide an enriched category **N<sup>M</sup>** that we use as arities. Since we do not assume that **<sup>M</sup>** is symmetric, **<sup>N</sup><sup>M</sup>** is defined to be an [**M**, **Set**]<sup>0</sup> t -category so that the opposite category **N**op **<sup>M</sup>** is an [**M**, **Set**]0-category. Let [**M**, **Set**] <sup>t</sup> be an [**M**, **Set**]<sup>0</sup> t -category induced by the closed structure of [**M**, **Set**]<sup>0</sup> t . That is, homobjects of [**M**, **Set**] <sup>t</sup> are given by [**M**, **Set**] t (G, H)m = [**M**, **Set**]0(G, H(− ⊗ m)).

**Definition 27.** An [**M**, **Set**]<sup>0</sup> t -category **<sup>N</sup><sup>M</sup>** is defined by the full sub-[**M**, **Set**]<sup>0</sup> t category of [**M**, **Set**] <sup>t</sup> whose set of objects is given by ob**N<sup>M</sup>** <sup>=</sup> {<sup>n</sup> · <sup>y</sup>(I) <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>N</sup>} ⊆ ob[**M**, **Set**] <sup>t</sup> where <sup>N</sup> is the set of natural numbers and n · y(I) is the <sup>n</sup>-fold coproduct of <sup>y</sup>(I). We sometimes identify ob**N<sup>M</sup>** with <sup>N</sup> via the mapping <sup>n</sup> <sup>→</sup> <sup>n</sup> := n · y(I).

**Lemma 28.** *The* [**M**, **Set**]0*-category* **<sup>N</sup>**op **<sup>M</sup>** *has* **<sup>N</sup>**op **<sup>M</sup>***-cotensors, which are given by* n n <sup>=</sup> n · n *for each* n *and* n *.* 

*Proof.* A cotensor (n · y(I)) - (n · y(I)) is a tensor (n · y(I)) <sup>⊗</sup><sup>t</sup> (n · y(I)) in [**M**, **Set**] t . Since <sup>⊗</sup><sup>t</sup> is biclosed, <sup>⊗</sup><sup>t</sup> preserves colimits in both arguments. Therefore, (<sup>n</sup> · <sup>y</sup>(I)) <sup>⊗</sup><sup>t</sup> (n · <sup>y</sup>(I)) <sup>∼</sup><sup>=</sup> (<sup>n</sup> · <sup>n</sup> ) · y(I). 

**N**op **<sup>M</sup>**-cotensors (i.e. n · y(I) - C) behave like an enriched counterpart of finite powers (−)<sup>n</sup>. We show that **<sup>N</sup>**op **<sup>M</sup>**-cotensors in a general [**M**, **Set**]0-category **<sup>A</sup>** are characterized by projections satisfying a universal property. Given a unit morphism ν : n <sup>→</sup> **<sup>A</sup>**(<sup>n</sup> - C, C) of the cotensor n - C, an [**M**, **Set**]0-natural transformation ν : **<sup>A</sup>**(B,n - C) <sup>→</sup> [n, **<sup>A</sup>**(B,C)] is given by f <sup>→</sup> (x <sup>→</sup> ν(x) ◦ f). The condition that <sup>ν</sup> is isomorphic can be rephrased as follows.

**Lemma 29.** *An* [**M**, **Set**]0*-category* **<sup>A</sup>** *has* **<sup>N</sup>**op **<sup>M</sup>***-cotensors if and only if for any* n <sup>∈</sup> <sup>N</sup> *and* C <sup>∈</sup> ob**A***, there exist an object* n - C <sup>∈</sup> ob**<sup>A</sup>** *and* (π<sup>1</sup>,...,π<sup>n</sup>) <sup>∈</sup> (**A**(n - C, C)I)<sup>n</sup> *such that the following condition holds: for each* m*, the function* <sup>f</sup> <sup>→</sup> (π<sup>1</sup> ◦ f,..., π<sup>n</sup> ◦ <sup>f</sup>) *of type* **<sup>A</sup>**(B,n -C)m <sup>→</sup> (**A**(B,C)m)<sup>n</sup> *is bijective.*

*An* [**M**, **Set**]0*-functor* <sup>F</sup> : **<sup>A</sup>** <sup>→</sup> **<sup>B</sup>** *preserves* **<sup>N</sup>**op **<sup>M</sup>***-cotensors if and only if* (F<sup>n</sup>C,C,I ◦ <sup>π</sup><sup>1</sup>,...,F<sup>n</sup>C,C,I ◦ <sup>π</sup><sup>n</sup>) <sup>∈</sup> (**B**(F(<sup>n</sup> - C),FC)I)<sup>n</sup> *satisfies the same condition for each* n *and* C*.*

*Proof.* The essence of the proof is that the unit morphism ν : n · y(I) <sup>→</sup> **<sup>A</sup>**(n - C, C) corresponds to elements <sup>π</sup><sup>1</sup>,...,π<sup>n</sup> <sup>∈</sup> **<sup>A</sup>**(<sup>n</sup> -C, C)I by [**M**, **Set**]0(n · y(I), **<sup>A</sup>**(n - C, C)) <sup>∼</sup><sup>=</sup> [**M**, **Set**]0(y(I), **<sup>A</sup>**(<sup>n</sup> - C, C))<sup>n</sup> <sup>∼</sup><sup>=</sup> - **<sup>A</sup>**(n - C, C)I n . The [**M**, **Set**]0-natural transformation <sup>ν</sup> is isomorphic if and only if each component <sup>ν</sup><sup>m</sup> : **<sup>A</sup>**(B,n - C)m <sup>→</sup> [n, **<sup>A</sup>**(B,C)] m of ν is isomorphic, which is moreover equivalent to the condition that <sup>f</sup> <sup>→</sup> (π<sup>1</sup> ◦ f,..., π<sup>n</sup> ◦ <sup>f</sup>) : **<sup>A</sup>**(B,n - C)m <sup>→</sup> (**A**(B,C))<sup>n</sup> is isomorphic since we have [n, **<sup>A</sup>**(B,C)] <sup>m</sup> <sup>∼</sup><sup>=</sup> (**A**(B,C)m)n.

The latter part of the lemma follows from the former part.

If (π1,...,πn) <sup>∈</sup> (**A**(n - C, C)I)<sup>n</sup> satisfies the condition in Lemma 29, we call the element <sup>π</sup><sup>i</sup> <sup>∈</sup> **<sup>A</sup>**(<sup>n</sup> - C, C)I the i*-th projection* of n - C. Note that the choice of projections is not necessarily unique. However, when we say that **A** is an [**M**, **Set**]0-category with **<sup>N</sup>**op **<sup>M</sup>**-cotensors, we implicitly assume that there are a chosen cotensor n - C and chosen projections (π1,...,π<sup>n</sup>) <sup>∈</sup> (**A**(n - C, C)I)<sup>n</sup> for each n <sup>∈</sup> ob**N**op **<sup>M</sup>** and <sup>C</sup> <sup>∈</sup> ob**A**. We also assume that 1 - X <sup>=</sup> X without loss of generality. Given n-tuple (f1,...,f<sup>n</sup>) of elements in **<sup>A</sup>**(B,C)m, we denote by f1,...,f<sup>n</sup> an element in **<sup>A</sup>**(B,n - <sup>C</sup>)<sup>m</sup> obtained by the inverse of <sup>f</sup> <sup>→</sup> (π<sup>1</sup> ◦ f,..., π<sup>n</sup> ◦ <sup>f</sup>) and call this a tupling. Tuplings and projections for **<sup>N</sup>**op **<sup>M</sup>**-cotensors behave like those for finite products.

The following proposition claims that **N**op **<sup>M</sup>** is a free [**M**, **Set**]0-category with chosen **N**op **<sup>M</sup>**-cotensors generated by one object.

**Proposition 30.** *Let* **<sup>A</sup>** *be an* [**M**, **Set**]0*-category with* **<sup>N</sup>**op **<sup>M</sup>***-cotensors and* C *be an object in* **A***. Then there exists a unique* **N**op **<sup>M</sup>***-cotensor preserving* [**M**, **Set**]0 *functor* F : **<sup>N</sup>**op **<sup>M</sup>** <sup>→</sup> **<sup>A</sup>** *such that* F n <sup>=</sup> <sup>n</sup> - <sup>C</sup> *and* F π<sup>i</sup> <sup>=</sup> <sup>π</sup><sup>i</sup>*.* 

We define **M**-graded Lawvere theories in a similar fashion to enriched Lawvere theories.

**Definition 31.** An **<sup>M</sup>**-graded Lawvere theory is a tuple (**L**, J) where **<sup>L</sup>** is an [**M**, **Set**]0-category with **<sup>N</sup>**op **<sup>M</sup>**-cotensors and J : **<sup>N</sup>**op **<sup>M</sup>** → **L** is an identity-onobjects **N**op **<sup>M</sup>**-cotensor preserving [**M**, **Set**]0-functor. A morphism F : (**L**, J) <sup>→</sup> (**L** , J ) between two graded Lawvere theories is an [**M**, **Set**]0-functor F : **<sup>L</sup>** <sup>→</sup> **<sup>L</sup>** such that F J <sup>=</sup> J . We denote the category of graded Lawvere theories and morphisms between them by **GLawM**.

By Proposition 30, the existence of the above J : **<sup>N</sup>**op **<sup>M</sup>** → **L** is equivalent to requiring that ob**L** = N and projections in **L** are chosen in some way. So, we sometimes leave <sup>J</sup> implicit and just write **<sup>L</sup>** <sup>∈</sup> **GLaw<sup>M</sup>** for (**L**, J) <sup>∈</sup> **GLawM**.

**Definition 32.** <sup>A</sup> *model* of graded Lawvere theory **<sup>L</sup>** in an [**M**, **Set**]0-category **A** with **N**op **<sup>M</sup>**-cotensor is an **<sup>N</sup>**op **<sup>M</sup>**-cotensor preserving [**M**, **Set**]0-functor of type **<sup>L</sup>** <sup>→</sup> **<sup>A</sup>**. A morphism α : F <sup>→</sup> G between two models F, G of graded Lawvere theory **<sup>L</sup>** is an [**M**, **Set**]0-natural transformation. Let Mod(**L**, **<sup>A</sup>**) be the category of models of graded Lawvere theory **<sup>L</sup>** in the [**M**, **Set**]0-category **<sup>A</sup>**.

In §3, we use a category **C** satisfying **M**-model condition to define a model of graded algebraic theory. Actually, **M**-model condition is sufficient to give an [**M**, **Set**]0-category with **<sup>N</sup>**op **<sup>M</sup>**-cotensors.

**Lemma 33.** *If* **<sup>C</sup>** *satisfies* **<sup>M</sup>***-model condition, then the* [**M**, **Set**]0*-category* **<sup>C</sup>** <sup>T</sup> op *defined in Definition 7 has* **N**op **<sup>M</sup>***-cotensors.*

*Proof.* For any X <sup>∈</sup> **<sup>C</sup>** <sup>T</sup> op and n, the cotensor n - X is given by finite power Xn, and the i-th projection is given by η- ◦ <sup>π</sup><sup>i</sup> <sup>∈</sup> **<sup>C</sup>** <sup>T</sup> op <sup>I</sup> where <sup>π</sup><sup>i</sup> : <sup>X</sup><sup>n</sup> <sup>→</sup> <sup>X</sup> is the i-th projection of the finite power Xn. The rest of the proof is routine. 

If we apply Lemma <sup>33</sup> to [**M**, **Set**]<sup>0</sup> equipped with the **<sup>M</sup>**<sup>t</sup> -action in Example <sup>9</sup> (here denoted by T), then ([**M**-, **Set**]0)<sup>T</sup> op coincides with [**M**, **Set**] (i.e. the [**M**, **Set**]0-category obtained by the closed structure of [**M**, **Set**]0).

## **5 Equivalence**

We have shown three graded notions: graded algebraic theories, graded Lawvere theories and finitary graded monads, which give rise to categories **GSM**, **GLaw<sup>M</sup>** and **GMndM**, respectively. This section is about the equivalence of these three notions. We give only a sketch of the proof of the equivalence, and the details are deferred to [14, Appendix A].

#### **5.1 Graded Algebraic Theories and Graded Lawvere Theories**

We prove that the category of graded algebraic theories **GS<sup>M</sup>** and the category of graded Lawvere theories **GLaw<sup>M</sup>** are equivalent by showing the existence of an adjoint equivalence **Th** <sup>U</sup> : **GLaw<sup>M</sup>** <sup>→</sup> **GSM**.

Let **<sup>M</sup>** be a small strict monoidal category and <sup>T</sup> = (Σ,E) be an **<sup>M</sup>**-graded algebraic theory. We define **Th**T (the object part of **Th**) as an **M**-graded Lawvere theory whose morphisms are terms of T modulo equational axioms.

**Definition 34.** An [**M**, **Set**]0-category **Th**<sup>T</sup> is defined by ob(**Th**<sup>T</sup> ) := <sup>N</sup> and (**Th**<sup>T</sup> )(n, n )m := (F <sup>T</sup> nm)<sup>n</sup>- with composition defined by substitution.

It is easy to show that **Th**<sup>T</sup> has **<sup>N</sup>**op **<sup>M</sup>**-cotensors (by Lemma 29). Therefore, **Th** is a mapping from an object in **GS<sup>M</sup>** to an object in **GLawM**.

We define a functor <sup>U</sup> : **GLaw**<sup>M</sup> <sup>→</sup> **GS**<sup>M</sup> by taking all the morphism <sup>f</sup> <sup>∈</sup> <sup>L</sup>(n, 1)<sup>m</sup> in <sup>L</sup> <sup>∈</sup> **GLaw**<sup>M</sup> as operations and all the equations that hold in L as equational axioms.

**Definition 35.** A functor <sup>U</sup> : **GLaw**<sup>M</sup> <sup>→</sup> **GS**<sup>M</sup> is defined as follows.


Then, **Th**<sup>T</sup> has the following universal property as a left adjoint of U.

**Lemma 36.** *For each* <sup>T</sup> *, let* <sup>η</sup><sup>T</sup> : T → <sup>U</sup>**Th**<sup>T</sup> *be a family of functions* <sup>η</sup><sup>T</sup> ,n,m : <sup>Σ</sup>n,m <sup>→</sup> <sup>F</sup> <sup>U</sup>**Th**<sup>T</sup> nm *defined by* <sup>η</sup><sup>T</sup> ,n,m(f) = [[f(x<sup>1</sup>,...,x<sup>n</sup>)](x<sup>1</sup>,...,x<sup>n</sup>)]*. For any* α : T → U**L***, there exists a unique morphism* <sup>α</sup> : **Th**T → **<sup>L</sup>** *such that* <sup>α</sup> <sup>=</sup> Uα ◦ <sup>η</sup><sup>T</sup> *.*  Moreover, the unit and the counit of **Th** U are isomorphic. Therefore:

**Theorem 37.** *Two categories* **GS<sup>M</sup>** *and* **GLaw<sup>M</sup>** *are equivalent.* 

We can also prove the equivalence of the categories of models.

**Lemma 38.** *If* **C** *is a category satisfying* **M***-model condition, then* Mod<sup>T</sup> (**C**) *is equivalent to* Mod(**Th**<sup>T</sup> , **<sup>C</sup>** <sup>T</sup> ) *where* <sup>T</sup> *is the* **<sup>M</sup>**<sup>t</sup> *-action on* **C***.* 

#### **5.2 Graded Lawvere theories and Finitary Graded Monads**

We prove that the category of graded Lawvere theories **GLaw<sup>M</sup>** and the category of finitary graded monads **GMnd<sup>M</sup>** are equivalent. Given a graded Lawvere theory, a finitary graded monad is obtained as a coend that represents the set of terms. On the other hand, given a finitary graded monad, a graded Lawvere theory is obtained from taking the full sub-[**M**, **Set**]0-category on arities ob(**N**op **<sup>M</sup>**) of the opposite category of the Kleisli(-like) category in Definition 7. These constructions give rise to an equivalence of categories.

An **<sup>M</sup>**-graded Lawvere theory yields a finitary graded monad by letting m∗X be the set of terms of grade m whose variables range over X.

**Definition 39.** Let **<sup>L</sup>** be an **<sup>M</sup>**-graded Lawvere theory. We define <sup>T</sup>**<sup>L</sup>** = (∗, η, μ) by a (finitary) **M**-graded monad whose functor part is given as follows.

$$m\*X := \int^{n \in \mathbb{N}\_0} \mathbf{L}(\underline{n}, \underline{1}) m \times X^n$$

Note that **<sup>L</sup>**(−, 1) : <sup>ℵ</sup><sup>0</sup> <sup>→</sup> [**M**, **Set**]<sup>0</sup> is a **Set**-functor here.

Given a graded monad, a graded Lawvere theory is obtained as follows.

**Definition 40.** Let <sup>T</sup> = (∗, η, μ) be an **<sup>M</sup>**-graded monad on **Set**. Let **<sup>L</sup>**<sup>T</sup> be the full sub-[**M**, **Set**]0-category of (**Set** <sup>T</sup> )op with ob(**L**<sup>T</sup> ) = <sup>N</sup>.

Since **<sup>L</sup>**<sup>T</sup> has **<sup>N</sup>**M-cotensors <sup>n</sup> - 1 = <sup>n</sup> whose projections are given by <sup>π</sup><sup>i</sup> <sup>=</sup> (∗ 
→ <sup>η</sup>(i)) <sup>∈</sup> **Set**(1, I <sup>∗</sup> <sup>n</sup>), **<sup>L</sup>**<sup>T</sup> is a graded Lawvere theory.

Given a morphism <sup>α</sup> : <sup>T</sup> <sup>→</sup> <sup>T</sup> in **GMndM**, we define **<sup>L</sup>**<sup>α</sup> : **<sup>L</sup>**<sup>T</sup> <sup>→</sup> **<sup>L</sup>**<sup>T</sup> by (**L**α)n,n-,m <sup>=</sup> **Set**(n , αn,m) : **<sup>L</sup>**<sup>T</sup> (n, n )m <sup>→</sup> **<sup>L</sup>**<sup>T</sup>- (n, n )m. It is easy to prove that **L**<sup>α</sup> is a morphism in **GLaw<sup>M</sup>** and **L**(−) : **GMnd<sup>M</sup>** → **GLaw<sup>M</sup>** is a functor.

**Theorem 41.** *Two categories* **GLaw<sup>M</sup>** *and* **GMnd<sup>M</sup>** *are equivalent.*

*Proof.* **L**(−) is an essentially surjective fully faithful functor.

## **6 Combining Effects**

Under the correspondence to algebraic theories, combinations of computational effects can be understood as combinations of algebraic theories. In particular, sums and tensor products are well-known constructions [9]. In this section, we show that these constructions can be adapted to graded algebraic theories. By the equivalence **GMnd<sup>M</sup> GLaw<sup>M</sup> GS<sup>M</sup>** in §5, constructions like sums and tensor products in one of these categories induce those in the other two categories. So, we choose **GS<sup>M</sup>** and describe sums as colimits in **GS<sup>M</sup>** and tensor products as a mapping **GS<sup>M</sup>**<sup>1</sup> × **GS<sup>M</sup>**<sup>2</sup> → **GS<sup>M</sup>1**×**M**<sup>2</sup> .

#### **6.1 Sums**

We prove that **GS<sup>M</sup>** has small colimits.

**Lemma 42.** *The category* **GS**<sup>M</sup> *has small coproducts.*

*Proof.* Given a family {(Σ(i), E(i))}i∈<sup>I</sup> of objects in **GSM**, the coproduct is obtained by the disjoint union of operations and equations: <sup>i</sup>∈<sup>I</sup> (Σ(i), E(i)) = - <sup>i</sup>∈<sup>I</sup> <sup>Σ</sup>(i), <sup>i</sup>∈<sup>I</sup> <sup>E</sup>(i) . 

**Lemma 43.** *The category* **GS**<sup>M</sup> *has coequalizers.*

*Proof.* Let <sup>T</sup> = (Σ,E) and <sup>T</sup> = (Σ , E ) be graded algebraic theories and α, β : T →T be a morphism. The coequalizer <sup>T</sup> of <sup>α</sup> and <sup>β</sup> is given by adding the set of equations induced by α and β to <sup>T</sup> , that is, <sup>T</sup> := (Σ , E∪E) where E <sup>=</sup> {(s, t) | ∃f <sup>∈</sup> Σ,α(f)=[s] <sup>∧</sup> β(f)=[t]}. 

Since a category has all small colimits if and only if it has all small coproducts and coequalizers, we obtain the following corollary.

**Corollary 44.** *Three equivalent categories* **GSM***,* **GMnd<sup>M</sup>** *and* **GLaw<sup>M</sup>** *are cocomplete.* 

**Example 45.** It is known that the sum of an ordinary monad T and the exception monad (−)+Ex (where Ex is a set of exceptions) is given by T((−)+Ex) [9, Corollary 3]. We show that a similar result holds for the graded exception monad.

Let <sup>T</sup> ex be the theory in Example <sup>25</sup> and **<sup>M</sup>** be the preordered monoid used there. We denote (∗ex, ηex, μex) for the graded exception monad. Let <sup>T</sup> = (Σ,E) be a (**1**-graded) theory and (T,η<sup>T</sup> , μ<sup>T</sup> ) be the corresponding ordinary monad. Let <sup>T</sup> **<sup>M</sup>** = (Σ**<sup>M</sup>**, E**<sup>M</sup>**) be the **<sup>M</sup>**-graded theory obtained from <sup>T</sup> as in Example 26. We consider a graded monad obtained as the sum of <sup>T</sup> ex and <sup>T</sup> **<sup>M</sup>**.

A free model functor F for <sup>T</sup> ex <sup>+</sup> <sup>T</sup> **<sup>M</sup>** is given by FXm <sup>=</sup> <sup>T</sup>(<sup>m</sup> <sup>∗</sup>ex <sup>X</sup>). For each n-ary operation f in <sup>T</sup> , <sup>|</sup>f<sup>|</sup> FX <sup>m</sup> : (T(m∗ex <sup>X</sup>))<sup>n</sup> <sup>→</sup> <sup>T</sup>(m∗ex <sup>X</sup>) is induced by free models of <sup>T</sup> , and for each e <sup>∈</sup> Ex, <sup>|</sup>raisee<sup>|</sup> FX <sup>m</sup> : 1 <sup>→</sup> <sup>T</sup>({e}∗ex <sup>X</sup>) is defined by ηT {e}∗exX(e) <sup>∈</sup> T({e} ∗ex X). It is easy to see that F X defined above is indeed a model of <sup>T</sup> ex <sup>+</sup><sup>T</sup> **<sup>M</sup>**. Therefore, we obtain a graded monad m <sup>∗</sup> X <sup>=</sup> T(m <sup>∗</sup>ex X).

#### **6.2 Tensor Products**

The tensor product of two ordinary algebraic theories (Σ,E) and (Σ , E ) is constructed as (Σ <sup>∪</sup> Σ , E <sup>∪</sup> <sup>E</sup> <sup>∪</sup> <sup>E</sup><sup>⊗</sup>) where <sup>E</sup><sup>⊗</sup> consists of <sup>f</sup>(λi.g(λj.xij )) = <sup>g</sup>(λj.f(λi.xij )) for each <sup>f</sup> <sup>∈</sup> <sup>Σ</sup> and <sup>g</sup> <sup>∈</sup> <sup>Σ</sup> . However, when we extend tensor products to graded algebraic theories, the grades of the both sides are not necessarily equal. If the grade of f is m and the grade of g is m , then the grades of <sup>f</sup>(λi.g(λj.xij )) and <sup>g</sup>(λj.f(λi.xij )) are <sup>m</sup> <sup>⊗</sup> <sup>m</sup> and <sup>m</sup> <sup>⊗</sup> <sup>m</sup>, respectively. Therefore, we have to somehow guarantee that the grade of f <sup>∈</sup> Σ and the grade of g <sup>∈</sup> Σ commute. We solve this problem by taking the product of monoidal categories. That is, we define the tensor product of an **M**1-graded algebraic theory and an **M**2-graded algebraic theory as an **M**<sup>1</sup> × **M**2-graded algebraic theory.

Before defining tensor products, we consider extending an **M**-graded theory to **M** -graded theory along a lax monoidal functor G = (G, ηG, μG) : **<sup>M</sup>** <sup>→</sup> **<sup>M</sup>** . Given an **<sup>M</sup>**-graded theory <sup>T</sup> = (Σ,E), we define the **<sup>M</sup>** -graded theory G∗<sup>T</sup> <sup>=</sup> (G∗Σ,G∗E) by (G∗Σ)n,m- := {<sup>f</sup> <sup>∈</sup> <sup>Σ</sup>n,m <sup>|</sup> Gm <sup>=</sup> <sup>m</sup> } and G∗E := {G∗(s) = G∗(t) <sup>|</sup> (s <sup>=</sup> t) <sup>∈</sup> E} where for each term t of <sup>T</sup> (with grade m), G∗(t) is the term of G∗<sup>T</sup> (with grade Gm) defined inductively as follows: if <sup>x</sup> is a variable, then G∗(x) := cη<sup>G</sup> (x); for each w : m <sup>→</sup> m and term t, G∗(cw(t)) := cGw(G∗(t)); for each <sup>f</sup> <sup>∈</sup> <sup>Σ</sup>n,m and terms <sup>t</sup>1,...,t<sup>n</sup> with grade <sup>m</sup> , G<sup>∗</sup>(f(t1,...,tn)) := c<sup>μ</sup><sup>G</sup> (f(G<sup>∗</sup>(t1),...,G<sup>∗</sup>(t<sup>n</sup>))).

m,m- The tensor product of T<sup>1</sup> ∈ **GS<sup>M</sup>**<sup>1</sup> and T<sup>2</sup> ∈ **GS<sup>M</sup>**<sup>2</sup> is defined by first extending T<sup>1</sup> and T<sup>2</sup> to **M**1×**M**2-graded theories and then adding commutation equations.

**Definition 46 (tensor product).** Let <sup>T</sup><sup>1</sup> = (Σ,E) <sup>∈</sup> **GS<sup>M</sup>**<sup>1</sup> and <sup>T</sup><sup>2</sup> = (Σ , E ) <sup>∈</sup> **GS<sup>M</sup>**<sup>2</sup> . The *tensor product* <sup>T</sup><sup>1</sup> ⊗ T<sup>2</sup> is defined by (K<sup>∗</sup><sup>Σ</sup> <sup>∪</sup> <sup>K</sup> ∗Σ , K<sup>∗</sup>E <sup>∪</sup> K <sup>∗</sup>E <sup>∪</sup> <sup>E</sup><sup>T</sup>1⊗T<sup>2</sup> ) <sup>∈</sup> **GS<sup>M</sup>**1×**M**<sup>2</sup> where <sup>K</sup> : **<sup>M</sup>**<sup>1</sup> <sup>→</sup> **<sup>M</sup>**<sup>1</sup> <sup>×</sup>**M**<sup>2</sup> and <sup>K</sup> : **<sup>M</sup>**<sup>2</sup> <sup>→</sup> **<sup>M</sup>**<sup>1</sup> <sup>×</sup>**M**<sup>2</sup> are lax monoidal functors defined by Km<sup>1</sup> := (m1, I2) and <sup>K</sup> <sup>m</sup><sup>2</sup> := (I1, m2), and

$$E\_{\mathcal{T}\_1 \otimes \mathcal{T}\_2} := \{ f(\lambda i.g(\lambda j.x\_{ij})) = g(\lambda j.f(\lambda i.x\_{ij})) \mid f \in (K\_\* \Sigma)\_{n,m}, g \in (K\_\*' \Sigma')\_{n',m'} \}.$$

That is, if <sup>f</sup> is an operation in <sup>T</sup><sup>1</sup> with grade <sup>m</sup><sup>1</sup> <sup>∈</sup> **<sup>M</sup>**1, then <sup>T</sup><sup>1</sup> ⊗ T<sup>2</sup> has the operation <sup>f</sup> with grade (m1, I2) <sup>∈</sup> **<sup>M</sup>**<sup>1</sup> <sup>×</sup> **<sup>M</sup>**<sup>2</sup> and similarly for operations in <sup>T</sup>2. The tensor products satisfy the following fundamental property.

**Proposition 47.** *Let* **C** *be a category satisfying* **M**<sup>1</sup> ×**M**2*-model condition. Let* <sup>T</sup><sup>i</sup> *be an* **<sup>M</sup>**i*-graded algebraic theory for* <sup>i</sup> = 1, <sup>2</sup>*. Then we have an isomorphism* Mod<sup>T</sup><sup>1</sup> (Mod<sup>T</sup><sup>2</sup> (**C**)) ∼= Mod<sup>T</sup>1⊗T<sup>2</sup> (**C**)*.*

*Proof.* Let ((A, |·| ), |·|) <sup>∈</sup> Mod<sup>T</sup><sup>1</sup> (Mod<sup>T</sup><sup>2</sup> (**C**)) be a model. For each operation f in <sup>T</sup>1, <sup>|</sup>f<sup>|</sup> : (A, |·| )<sup>n</sup> <sup>→</sup> m (A, |·| ) is a homomorphism. This condition is equivalent to satisfying the equations in <sup>E</sup><sup>T</sup>1⊗T<sup>2</sup> . 

**Example 48.** We exemplify the tensor product by showing a graded version of [9, Corollary 6], which claims that the L-fold tensor product of the side-effects theory in [21] with one location is the side-effects theory with L locations.

First, we consider the situation where there is only one memory cell whose value ranges over a finite set V . Let **<sup>2</sup>** the preordered monoid (join-semilattice) ({⊥, }, <sup>≤</sup>, <sup>∨</sup>, <sup>⊥</sup>) where <sup>≤</sup> is the preorder defined by ⊥≤. Intuitively, <sup>⊥</sup> represents pure computations, and represents (possibly) stateful computations. Let <sup>T</sup>st be a **<sup>2</sup>**-graded theory of two types of operations lookup (arity: <sup>V</sup> , grade: ) and update<sup>v</sup> (arity: 1, grade: ) for each <sup>v</sup> <sup>∈</sup> <sup>V</sup> and the four equations in [21] for the interaction of lookup and update. Note that we have to insert coercion to arrange the grade of the equation lookup(λv <sup>∈</sup> V.updatev(x)) = <sup>c</sup>⊥≤(x).

The graded monad (∗, η, μ) induced by <sup>T</sup>st is as follows.

$$\bot \ast X = X \qquad \top \ast X = (V \times X)^V \qquad ((\bot \le \top) \ast X)(x) = \lambda v.(v, x)$$

The middle equation can be explained as follows: any term with grade can be presented by a canonical form <sup>t</sup><sup>f</sup> := lookup(λv.update<sup>f</sup><sup>V</sup> (v)(f<sup>X</sup>(v))) where <sup>f</sup> <sup>=</sup> f<sup>V</sup> , f<sup>X</sup> : <sup>V</sup> <sup>→</sup> <sup>V</sup> <sup>×</sup> <sup>X</sup> is a function, and therefore, the mapping <sup>f</sup> <sup>→</sup> <sup>t</sup><sup>f</sup> gives a bijection between (V <sup>×</sup> X)<sup>V</sup> and ∗ X <sup>=</sup> T <sup>Σ</sup>  (X)/∼.

The <sup>L</sup>-fold tensor product of <sup>T</sup>st, which we denote by <sup>T</sup> <sup>⊗</sup><sup>L</sup> st , is a **<sup>2</sup>**L-graded theory where **<sup>2</sup>**<sup>L</sup> = (2L, <sup>⊆</sup>, <sup>∪</sup>, <sup>∅</sup>) is the join-semilattice of subsets of <sup>L</sup>. Specifically, <sup>T</sup> <sup>⊗</sup><sup>L</sup> st consists of operations lookup<sup>l</sup> and updatel,v with grade {l} for each <sup>l</sup> <sup>∈</sup> <sup>L</sup> and v <sup>∈</sup> V with additional three commutation equations in [21]. The induced graded monad is L <sup>∗</sup>⊗<sup>L</sup> <sup>X</sup> <sup>=</sup> {<sup>f</sup> : <sup>V</sup> <sup>L</sup> <sup>→</sup> (<sup>V</sup> <sup>L</sup> <sup>×</sup> <sup>X</sup>) <sup>|</sup> read(L , f) <sup>∧</sup> write(L , f)} where L <sup>⊆</sup> <sup>L</sup>, and read(L , f) and write(L , f) assert that f depends only on values at locations in L and does not change values at locations outside L . That is, <sup>L</sup> <sup>∗</sup>⊗<sup>L</sup> X represents computations that touch only memory locations in L .

$$\begin{array}{rcl} \text{read}(L',f) & := & \forall \sigma, \sigma' \in V^n, (\forall l \in L', \sigma(l) = \sigma'(l)) \implies f(\sigma) = f(\sigma')\\ \text{write}(L',f) & := & \forall \sigma, \sigma' \in V^n, x \in X, (\sigma',x) = f(\sigma) \implies \forall l \notin L', \sigma(l) = \sigma'(l) \end{array}$$

## **7 Related Work**

*Algebraic theories for graded monads.* Graded monads are introduced in [27], and notions of graded theory and graded Eilenberg–Moore algebra appear in [4, 17] for coalgebraic treatment of trace semantics. However, these work only deal with N-graded monads where N is regarded as a discrete monoidal category, while we deal with general monoidal categories. The Kleisli construction and the Eilenberg–Moore construction for graded monads are presented in [7] by adapting the 2-categorical argument on resolutions of monads [29].

Algebraic operations for graded monads are introduced in [12] and classified into two types, which are different in how to integrate the grades of subterms. One is operations that take terms with the same grade, and these are what we treated in this paper. The other is operations that take terms with different grades: the grade of f(t1,...,t<sup>n</sup>) is determined by an *effect function*  : **<sup>M</sup>**<sup>n</sup> <sup>→</sup> **<sup>M</sup>** associated to f. Although the latter type of operations is also important to give natural presentations of computational effects, we leave it for future work.

*Enriched Lawvere theories.* There are many variants of Lawvere theories [1, 10, 11, 15, 16, 19, 24, 25, 28], and most of them share a common pattern: they are defined as an identity-on-objects functor from a certain category (e.g., <sup>ℵ</sup>op <sup>0</sup> ) which represents arities, and the functor must preserve a certain class of products (or cotensors if enriched). Among the most relevant work to ours are enriched Lawvere theories [24] and discrete Lawvere theories [10].

For a given monoidal category **V**, a Lawvere **V**-theory is defined as an identity-on-objects finite cotensor (i.e. **V**fp-cotensor) preserving **V**<sup>t</sup> -functor J : **V**op fp → **L** where **V**fp is the full subcategory of **V** spanned by finitely presentable objects. If **<sup>V</sup>** = [**M**, **Set**]<sup>0</sup> t , Lawvere [**M**, **Set**]<sup>0</sup> t -theories are analogous to our graded Lawvere theories except that we used **N**op **<sup>M</sup>** instead of ([**M**, **Set**]0)fp. Since n · y(I) <sup>∈</sup> **<sup>N</sup>**op **<sup>M</sup>** is finitely presentable, we can say that the notion of graded Lawvere theory is obtained from enriched Lawvere theories by restricting arities to **N**op **<sup>M</sup>** <sup>⊆</sup> ([**M**, **Set**]0)fp. However, the correspondence to finitary graded monads on **Set** is an interesting point of our graded Lawvere theories compared to Lawvere **V**-theories, which correspond to finitary **V**-monads on **V**.

Discrete Lawvere theories restrict arities of Lawvere **V**-theories to ℵ0, that is, a discrete Lawvere **V**-theory is defined as a (**Set**-enriched) finite-product preserving functor J : <sup>ℵ</sup>op <sup>0</sup> <sup>→</sup> **<sup>L</sup>**<sup>0</sup> where **<sup>L</sup>** is a **<sup>V</sup>**<sup>t</sup> -category. Actually, discrete Lawvere [**M**, **Set**]<sup>0</sup> t -theories are equivalent to graded Lawvere theories because there is a finite-product preserving functor ι : <sup>ℵ</sup>op <sup>0</sup> <sup>→</sup> **<sup>N</sup>**op **<sup>M</sup>** such that the composition with ι gives a bijection between graded Lawvere theories J : **<sup>N</sup>**op **<sup>M</sup>** → **L** and discrete Lawvere [**M**, **Set**]<sup>0</sup> t -theories <sup>J</sup><sup>0</sup> ◦ <sup>ι</sup> : <sup>ℵ</sup>op <sup>0</sup> → **L**0. However, we considered not only symmetric monoidal categories but also nonsymmetric ones, which cause a nontrivial problem when we define tensor products of algebraic theories. The problem is that adding commutation equations requires some kind of commutativity of monoidal categories. We solved this problem by considering product monoidal categories and defining the tensor product of an **M**1-graded theory and an **M**2-graded theory as an **M**<sup>1</sup> × **M**2-graded theory, and the use of two different monoidal categories is new to the best of our knowledge.

## **8 Conclusions and Future Work**

To extend the correspondence between algebraic theories, Lawvere theories, and (finitary) monads, we introduced notions of graded algebraic theory and graded Lawvere theory and proved their correspondence with finitary graded monads. We also provided sums and tensor products for graded algebraic theories, which are natural extensions of those for ordinary algebraic theories. Since we do not assume monoidal categories to be symmetric, our tensor products are a bit different from the ordinary ones in that this combines two theories graded by (or enriched in) different monoidal categories. We hope that these results will lead us to apply many kinds of techniques developed for monads to graded monads.

As future work, we are interested in "change-of-effects", that is, changing the monoidal category **M** in **M**-graded algebraic theory along a (lax) monoidal functor F : **<sup>M</sup>** <sup>→</sup> **<sup>M</sup>** . The problem already appeared in §6.2 to define tensor products, but we want to look for more properties of this operation. We are also interested in integrating a more general framework for notions of algebraic theory [6] and obtaining a graded version of the framework. Another direction is exploiting models of graded algebraic theories as modalities in the study of coalgebraic modal logic [4, 17] or weakest precondition semantics [8].

*Acknowledgement.* We thank Soichiro Fujii, Shin-ya Katsumata, Yuichi Nishiwaki, Yoshihiko Kakutani and the anonymous referees for useful comments. This work was supported by JST ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603).

## **References**

1. Berger, C., Melli`es, P.A., Weber, M.: Monads with arities and their associated theories. Journal of Pure and Applied Algebra **216**(8), 2029 – 2048 (2012). https://doi.org/10.1016/j.jpaa.2012.02.039, special Issue devoted to the International Conference in Category Theory 'CT2010'


which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. **Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Curry-style Semantics of Interaction: From untyped to second-order lazy** *λμ***-calculus**

James Laird

Department of Computer Science, University of Bath, UK

**Abstract.** We propose a "Curry-style" semantics of programs in which a nominal labelled transition system of types, characterizing observable behaviour, is overlaid on a nominal LTS of untyped computation. This leads to a notion of program equivalence as typed bisimulation.

Our semantics reflects the role of types as hiding operators, firstly via an axiomatic characterization of "parallel composition with hiding" which yields a general technique for establishing congruence results for typed bisimulation, and secondly via an example which captures the hiding of implementations in abstract data types: a typed bisimulation for the (Curry-style) lazy λμ-calculus with polymorphic types. This is built on an abstract machine for CPS evaluation of λμ-terms: we first give a basic typing system for this LTS which characterizes acyclicity of the environment and local control flow, and then refine this to a polymorphic typing system which uses equational constraints on instantiated type variables, inferred from observable interaction, to capture behaviour at polymorphic and abstract types.

## **1 Introduction**

"Church-style" and "Curry-style" are used to distinguish programming languages in which the type of a term is intrinsic to its definition from those in which it is an extrinsic property. The same distinction may be applied to semantics of programming languages: in many models, type-objects are essential to the interpretation of a term — e.g. as a morphism between objects (types) in a category — but interpreting terms independently of their types (as in e.g. realizability interpretations) may have conceptual and practical advantages, particularly for describing Curry-style type systems. The aim of this semantic investigation of higher-order programs is to develop a Curry-style semantics of interaction by overlaying a labelled transition system of types onto a LTS of untyped computation, so that the observable behaviour of a typed state is restricted to the actions made available by its type. Our objective is to apply this to lazy functional programs: untyped and with Curry-style polymorphic typing systems, and to develop a theory of program equivalence — *typed bisimulation* — able to describe genericity and abstract datatypes in this setting.

**Game Semantics** Games models for programming languages are typically (but not invariably) given in a Church-style: terms are interpreted as strategies on a specified two-player game which represents their type [2,9]. This kind of semantics is compositional by definition, at the cost of forgetting the internal computational behaviour of programs, and potentially excluding system level behaviour [6]. It uses categorical structure to describe its models and prove key results — in particular *soundness* with respect to an operational semantics.

By contrast, in *operational* game semantics [15,12], programs are interpreted as states in a labelled transition system based directly on their syntax and operational semantics. Internal computation is retained but can be factored out by restricting to observable behaviour. Soundness of these models "comes for free" — instead, the fundamental property requiring non-trivial proof is that they are *compositional* — that is, the equivalence induced on programs is a congruence. Basic structure which supports and systematizes these proofs would be useful (techniques such as Howe's method are not available in this intensional setting). We aim to show that defining operational game semantics in a Curry style gives the opportunity to formulate and apply such structure. This is complementary to characterization of the structure of operational game semantics at a categorical level [18], into which we believe our semantics can fit well. Our motivation and general methodology bears similarities to the programme of Berger, Honda and Yoshida [3] —- in which Curry-style types are used to characterize the πcalculus processes corresponding to functional and polymorphic programs — and to typing systems for process calculi such as those described in [10].

**Hiding using types** We will interpret (extrinsic) types as hiding operators: windows through which terms of a given type may interact with the world, while their internal behaviour is hidden from external observation — both passive and active. Our goal is to show that this interpretation can be used to model information hiding in two key areas of higher-order computation. The first, "parallel composition with hiding" is the fundamental operation on which game semantics is based. We axiomatize the notion of a typing system for an LTS with such an operation, in which a type is a state which characterizes precisely the possible interaction between a function and its argument at that type.

The second form of information hiding for which we give a Curry-style interpretation is hiding of implementation details using polymorphic (existential) types as abstract data types. Our key example of a typed labelled transition systems is a new model of the second-order λμ-calculus: we shall now discuss the background and significance of this contribution.

#### **1.1 Program Equivalence and Polymorphism**

Our starting point is the lazy λ-calculus — the pure, untyped λ-calculus, evaluated by weak head reduction — and its extension with first-class continuations, the corresponding version of Parigot's λμ-calculus [21]. As argued in [1], the lazy λ-calculus approximates well to the behaviour of lazy functional programming languages such as Haskell, and is thus an appropriate setting in which to explore properties such as program equivalence, for which there is now a rich and well-studied theory. For instance, *open* or *normal form* bisimilarity [25] is a coinductively defined equivalence which extends β-equivalence to infinitary behaviours. It gives a purely intensional characterization of program equivalence (by contrast to e.g. applicative bisimilarity, which involves quantifying over all possible arguments) and has a variety of alternative characterizations — for instance two terms are open bisimilar if and only if they have the same *Levy-Longo* trees [19], or their (call-by-name) translations in the π-calculus are weakly bisimilar [25,5]. (Or, indeed, if they are normal-form bisimilar as λμ-terms.)

Normal form bisimilarity of simply-typed λ-terms is just β-equivalence. However, extending to polymorphic types, such as those of the second-order λcalculus (System F) [7,24] poses deeper questions. A primary motivation for introducing polymorphic types is that they can express abstract data types which hide implementation details [20] (cf. the module systems of Haskell and ML). A useful notion of program equivalence should therefore reflect this. As a simple example, the untyped λ-terms λf.f λx.λy.x and λf.f λx.λy.y are clearly not normal form bisimilar. But at the second-order type <sup>∃</sup>X.X - ∀Y.(∀X.X → Y ) → Y (which they both inhabit in a Curry-style presentation), they should be behaviourally equivalent — since any function of type : ∀X.(X → Y ) will never call its argument. In other words, the existential type ∃X.X "hides" the difference between λf.f λx.λy.x and λf.f λx.λy.y. This is an observational equivalence, but of a particularly fundamental kind, since it (and other equivalences involving abstract data types) is robust in the presence or absence of side-effects. It can be captured by extensional methods such as applicative bisimilarity, which was extended to a polymorphic setting in [26], but this requires *quantification* over instantiating terms and types, whereas our semantics is based on *unification* of instantiating types.

The problem is that comparing the evaluation trees of terms (e.g. by normal form bisimulation) does not capture the capacity of their types to restrict interaction with the environment. Game semantics does reflect this interaction (in various manifestations), and therefore offers a potential solution. Although several games models for polymorphism do not capture data abstraction by existential types (including Hughes' semantics of System F [8], which is faithful with respect to βη-equivalence, and Curry-style models [16]) a series of related approaches does so. These include translation into the (polymorphically typed) π-calculus [4], and an operational form [17,27] and a traditional compositional presentation [14,13] of game semantics.

In these semantics, values of polymorphic variable type are interpreted as *pointers* to data of undisclosed type — e.g. a location where it is stored, or a channel on which it may be received. Instantiation of universally quantified type variables replaces this pointer-passing with copycat behaviour. This gives a natural interpretation of polymorphism in settings such as the π-calculus, or languages with general references, where pointers are first-class objects. However, it is closely associated with a Church-style presentation of second-order type systems — e.g. by the interpretation of type abstraction as an explicit creation of a pointer; in the case of "typed normal form bisimulation" [17] the translation of a term is explicitly determined by its type. This is significant because it is in the presence of polymorphism that key differences between Church-style and Currystyle emerge — for example, in allowing intersection types. The pointer-passing models also exhibit behaviours which go beyond untyped functional interaction, making their relationship to it unclear — in the game semantics [14], instantiation violates the fundamental innocence and visibility conditions on strategies; the π-calculus interpretation uses free name as well as bound name passing.

Curry-style semantics give a natural interpretation of second-order Currystyle typing, with a simple relationship to the semantics of the untyped λμcalculus, by overlaying a more refined LTS of second order types on the same underlying LTS of computations.

## **2 Typed Labelled Transition Systems**

In this section we describe a notion of typed labelled transition system and an associated equivalence: typed bisimulation. Based on this we axiomatize a simple typing system for parallel composition with hiding and show that it preserves typed bisimulation. Examples of typed LTS (in the form of models of the lazy λμ-calculus and lazy λμ2-calculus) follow in the rest of the paper.

We work in the setting of *nominal sets* [23], which allows the introduction of fresh names (for store locations, communication channels, types etc). Assume a fixed, infinite set of *atoms* and a group G of permutations on them. A nominal set X is an action of G on a set |X| such that each x ∈ |X| has a finite supporting set of atoms such that if π(a) = a for all atoms in this set then π·x = x. We write sup(x) for the ⊆-least of these sets (which is the intersection of all supporting sets for x).

**Definition 1.** *A nominal LTS is a labelled transition system* (S, *Act*, →) *such that* S *(states) and Act (actions) are nominal sets and the transition relation* → *is* equivariant *— i.e. for any* π ∈ G*, C* <sup>a</sup> −→ *C if and only if* <sup>π</sup> · *<sup>C</sup>* <sup>π</sup>·<sup>a</sup> −→ <sup>π</sup> · *<sup>C</sup>* - *.*

Similarly motivated notions of nominal LTS are developed in e.g. [22]. Our key example — an abstract machine for direct-style CPS evaluation — is given in the next section.

The directly observable part of a labelled transition system may be characterized by defining a *typing system* for it. (Similar notions of typing system for a process calculus are defined in [10], for example.)

**Definition 2.** *A* typing system *for a nominal LTS* (S; *Act*; →) *is a nominal LTS* (T ; *Obs*; →) *such that Obs* ⊆ *Act , with a relation,* ⦂ *(typing), from* S *to* T *which satisfies the following* subject reduction *properties for each C* ⦂ T*:*


Subject reduction requires that actions which are *observable* (i.e. in *Obs*) change a computation and its type in a way that respects the typing relation, and that those which are *internal* to a computation (i.e. in *Act*\*Obs*) maintain its type (provided that any names fresh for the state are also fresh for its type).

Let =⇒ be the reflexive, transitive closure of the internal reduction −→, and define *C* ⦂ T <sup>a</sup>=⇒ C- ⦂ T if *C* ⦂ T =⇒ *D* ⦂ T <sup>a</sup> −→ *D*- ⦂ T- =⇒ *C* - ⦂ T- . To define weak bisimulation between typed states based on these relations, we need to take account of the fact that a name may be fresh for one, but already occur internally in the other (cf. [22]). So bisimulation is defined up to the equivalence on the states of type T which allows permutation of internal names: *C* T *<sup>C</sup>* if there exists a permutation π ∈ stab(T) (i.e. π · T = T) such that *C* -= π · *C* .

**Definition 3.** *A* typed bisimulation *is a binary, symmetric, equivariant relation* R *between typed states* (*C* ⦂ S)*, such that if* (*C* ⦂ S)R(*D* ⦂ T) *then* S = T *and:*


*Typed bisimilarity is the largest typed bisimulation: states C and D are bisimilar at type* <sup>T</sup> *(C* <sup>∼</sup>T *D) if* (*<sup>C</sup>* <sup>⦂</sup> <sup>T</sup>) *and* (*<sup>D</sup>* <sup>⦂</sup> <sup>T</sup>) *are typed bisimilar.*

### **2.1 Parallel Composition with Hiding**

Having proposed an interpretation of types as operators which hide internal communication, we now characterize the properties of a typing system for *parallel composition with hiding* which entail that it preserves typed bisimulation (i.e. the latter is a congruence).

**Definition 4.** *An* interaction structure *is a nominal LTS* (S; *Act*; →) *such that Act* = L∪({+, −}×L) *for some set of* L *of (unpolarized) labels, with an equivariant partial binary operation* | *on* S *(* parallel composition*) such that if C* = *C*1|*C*<sup>2</sup> *then C* <sup>a</sup> −→ *C if and only if C* - = *C* - <sup>1</sup>|*C* - <sup>2</sup> *for some C* - <sup>1</sup> *and C* - <sup>2</sup> *such that either:*

$$\begin{array}{l} - \quad C\_1 \stackrel{a}{\longrightarrow} C\_1' \text{ and } C\_2' = C\_2, \text{ where } (\text{sup}(C\_1') \cup \text{sup}(a)) \cap \text{sup}(C\_2) \subseteq \text{sup}(C\_1) \text{ or,} \\\ - \quad C\_1' = C\_1 \text{ and } C\_2 \stackrel{a}{\longrightarrow} C\_2', \text{ where } (\text{sup}(C\_2') \cup \text{sup}(a)) \cap \text{sup}(C\_1) \subseteq \text{sup}(C\_2) \text{ or,} \\\ - \quad C\_1 \stackrel{pa}{\longrightarrow} C\_1' \text{ and } C\_2 \stackrel{\overline{p}a}{\longrightarrow} C\_2', \text{ where } p \in \{+, -\}. \end{array}$$

The nominal side-conditions require that any names which are fresh for the component to which they are introduced are fresh for the whole state.

Parallel composition is typed using a *ternary relation* between types: T<sup>1</sup> T2 T<sup>3</sup> means "T<sup>2</sup> is an arrow type from T<sup>1</sup> to T3" — there may be several arrow types between two types (or none).

**Definition 5.** *A typing system for an interaction structure* (Comp,L, |) *is a typing system* (T ; ({+, −} × L); →) *for* Comp *with an equivariant ternary relation, , on* <sup>T</sup> *such that if* <sup>T</sup><sup>1</sup> T2 T<sup>3</sup> *then for any C*<sup>1</sup> ⦂ T<sup>1</sup> *and C*<sup>2</sup> ⦂ T<sup>2</sup> *such that* sup(*C*1) ∩ sup(*C*2) ⊆ sup(T1)*, the state C*1|*C*<sup>2</sup> *is well-defined, has type* T<sup>3</sup> *and satisfies the following* interaction conditions*:*

*1. If C*<sup>1</sup> pl −→ *C and C*<sup>2</sup> pl −→ *C* - <sup>2</sup> *then* T<sup>1</sup> pl → T- <sup>1</sup> *and* T<sup>2</sup> pl → T- <sup>2</sup> *such that* T- 1 T- 2 T3*. 2. If C*<sup>2</sup> <sup>a</sup> −→ *C* - <sup>2</sup> *and* T<sup>3</sup> a → T- <sup>3</sup> *(with* sup(T- <sup>3</sup>)∩sup(T2) ⊆ sup(T3)*) then* T<sup>2</sup> a → T- 2 *such that* T<sup>1</sup> T- 2 T- 3*. 3. If C*<sup>1</sup> <sup>a</sup> −→ *C* - <sup>1</sup> *and* T<sup>3</sup> a- → T- <sup>3</sup> *then* a = a- *.*

Informally (1) requires that if *C*<sup>1</sup> and *C*<sup>2</sup> may communicate, then this is permitted by T<sup>1</sup> and T2, and (2) and (3) require that the observable actions of *C*1|*C*<sup>2</sup> permitted by T<sup>3</sup> correspond to actions of *C*<sup>2</sup> permitted by T3. Note that for any *C*<sup>1</sup> ⦂T<sup>1</sup> and *C*<sup>2</sup> ⦂T<sup>2</sup> there exists C- <sup>1</sup> <sup>T</sup><sup>1</sup> *<sup>C</sup>*<sup>1</sup> such that sup(*<sup>C</sup>* - <sup>1</sup>)∩sup(*C*2) ⊆ sup(T1) — i.e. there are no sidechannels of communication between *C* - <sup>1</sup> and *C*<sup>2</sup> — and thus *C* - <sup>1</sup>|*C*<sup>2</sup> is well-defined, has type T<sup>3</sup> and satisfies the interaction conditions. Moreover, these are sufficient to establish that typed bisimulation is a congruence with respect to parallel composition with hiding: a result that we will apply to our examples in the rest of the paper.

**Proposition 1.** *If C*<sup>1</sup> <sup>∼</sup>T<sup>1</sup> *<sup>D</sup>*<sup>1</sup> *and C*<sup>2</sup> <sup>∼</sup>T<sup>2</sup> *<sup>D</sup>*<sup>2</sup> *(and* sup(C1)∩sup(C2),sup(*D*1)<sup>∩</sup> sup(*D*2) ⊆ sup(T1)*) where* T<sup>1</sup> T2 <sup>T</sup><sup>3</sup> *then C*1|*C*<sup>2</sup> <sup>∼</sup>T<sup>3</sup> *<sup>D</sup>*1|*D*2*.*

*Proof.* We first establish the following renaming property: if *C*<sup>1</sup> ⦂ T<sup>1</sup> −→ *C* - <sup>1</sup> ⦂ T<sup>1</sup> then there exists π ∈ stab(T1) ∩ stab(T2) ∩ stab(T3) such that *C*1|*C*<sup>2</sup> ⦂ T<sup>3</sup> −→ π(*C* - <sup>1</sup>)|*C*2⦂T<sup>3</sup> — by renaming any fresh names introduced by internal transition so that they are also fresh for *C*2. Similarly, any internal reduction of *C*<sup>2</sup> corresponds to a reduction of *C*1|*C*2, up to such a renaming.

So suppose *C*1|*C*<sup>2</sup> ⦂ T<sup>3</sup> pl −→ *C* - ⦂ T<sup>3</sup> (an observable transition). By definition of an interaction structure, and conditions (2) and (3), *C*<sup>2</sup> ⦂ T<sup>2</sup> pl −→ *C* - <sup>2</sup> ⦂ T- <sup>3</sup> such that T<sup>1</sup> T- 2 T- <sup>3</sup>. By assumption, there exists *D*- <sup>2</sup> <sup>T</sup><sup>2</sup> *<sup>D</sup>*<sup>2</sup> such that *<sup>D</sup>*- <sup>2</sup> ⦂ T<sup>2</sup> =⇒ *D*-- 2 <sup>a</sup> −→ *D*--- <sup>2</sup> ⦂ T- <sup>2</sup> =⇒ *D*---- <sup>2</sup> ⦂ T- <sup>2</sup> and *D*---- <sup>2</sup> ∼T- <sup>2</sup> *C* - <sup>2</sup> and by the renaming property we may rename any fresh names in this reduction sequence to avoid clashes with *D*<sup>1</sup> — i.e. there exists π ∈ stab(T1) ∩ stab(T2) ∩ stab(T3) such that:

*D*1|*D*- <sup>2</sup> ⦂ T<sup>3</sup> =⇒ *D*1|π(*D*-- <sup>2</sup> ) <sup>π</sup>(a) −→ *<sup>D</sup>*1|π(*D*--- <sup>2</sup> ) ⦂ T- <sup>3</sup> =⇒ *D*1|π(*D*---- <sup>2</sup> ) ⦂ T- <sup>3</sup>, and hence <sup>π</sup>−<sup>1</sup>(*D*1)|π−<sup>1</sup>(*D*- <sup>2</sup>) ⦂ T<sup>3</sup> pl <sup>=</sup><sup>⇒</sup> <sup>π</sup>−<sup>1</sup>(*D*1)|D---- <sup>2</sup> as required (since bisimilarity is closed under permutation of internal names).

If *C*1|*C*<sup>2</sup> ⦂ T<sup>3</sup> performs an *internal* action then this is either an internal action of *C*<sup>1</sup> ⦂ T<sup>1</sup> or *C*<sup>2</sup> ⦂ T2, which is similar to the observable case, or else *C*<sup>1</sup> pl −→ *C* - <sup>1</sup> and *C*<sup>2</sup> pl −→ *C* - <sup>2</sup> — so that *C*1|*C*<sup>2</sup> performs the internal action l. Then by interaction condition (1), T<sup>1</sup> pl → T- <sup>1</sup> and T<sup>2</sup> pl → T- <sup>2</sup> such that T- 1 T- 2 T3. So since *<sup>C</sup>*<sup>1</sup> <sup>∼</sup>T<sup>1</sup> *<sup>D</sup>*<sup>1</sup> and *<sup>C</sup>*<sup>2</sup> <sup>∼</sup>T<sup>2</sup> *<sup>D</sup>*2, there exist <sup>D</sup>- <sup>1</sup> <sup>∼</sup>T<sup>1</sup> <sup>D</sup><sup>1</sup> and *<sup>D</sup>*- <sup>2</sup> <sup>∼</sup>T<sup>2</sup> <sup>D</sup><sup>2</sup> such that *D*- <sup>1</sup> ⦂ T<sup>1</sup> =⇒ *D*-- <sup>1</sup> ⦂ T<sup>1</sup> pl −→ *D*--- <sup>1</sup> ⦂ T- <sup>1</sup> =⇒ *D*---- <sup>1</sup> ⦂ T- <sup>1</sup> and *D*- <sup>2</sup> ⦂ T<sup>2</sup> =⇒ *D*-- <sup>2</sup> ⦂ T<sup>2</sup> pl −→ *D*--- <sup>2</sup> ⦂ T- <sup>2</sup> =⇒ *D*---- <sup>2</sup> ⦂ T- <sup>1</sup> where C- <sup>1</sup> ∼T- <sup>1</sup> D---- <sup>1</sup> and *C* - <sup>2</sup> ∼T- <sup>2</sup> *D*---- <sup>2</sup> . So using the renaming property we may obtain π ∈ stab(T1) ∩ stab(T2) ∩ stab(T3) such that *D*- 1|*D*- <sup>2</sup> ⦂T<sup>3</sup> =⇒ π(*D*-- <sup>1</sup> )|π(*D*-- <sup>2</sup> )⦂T<sup>3</sup> −→ π(*D*--- <sup>1</sup> )|π(*D*--- <sup>2</sup> )⦂T<sup>3</sup> =⇒ π(*D*---- <sup>1</sup> )|π(*D*--- <sup>2</sup> )⦂T<sup>3</sup> as required.

## **3 The Lazy** *λμ***-calculus**

We now define a typed interaction system giving an interpretation of the (untyped) lazy λμ-calculus — i.e. a direct-style CPS interpretation of lazy functional computation — yielding a novel, direct characterization of normal form bisimulation as typed bisimulation. This acts as a non-trivial example of a typed interaction system (as defined in the previous section) and a stepping stone to the polymorphic typing system for the same underlying language in the next section. First, we define an abstract machine for lazy CPS evaluation, in the form of a nominal LTS in which actions make explicit the calls made by a program to its environment. (Cf the analysis of λμ-calculus by π-calculus translation in [5].)

**Definition 6.** *The* unnamed *and* named *terms of the untyped* λμ*-calculus [21] are given (respectively) by the following grammars:* t ::= x | λx.t | t t | μα.M M ::= [α]t

We equip the set of λμ-terms with a group action by assuming a set N of distinguished identifiers, partitioned into sorts (infinite subsets) of λ-variables (x, y, z, . . .) and μ-variables (α, β, γ . . .) and (for later use) type variables (X, Y, Z, . . .). The group of sort-preserving permutations on N acts pointwise on expressions (i.e. permuting elements of N and fixing symbols not in N ). We form a nominal set of λμ-terms consisting of the terms in which the free variables are all in N and those which occur bound (by λ or μ) are not, so that the support of a term is its set of free variables.

Based on this syntax, we define the sets of expressions (control terms) which determine the next transition of our abstract machine.

**Definition 7.** Control terms *are given by the grammar:* A ::= M | V | K | •


As above we form a nominal set of control terms in which the support of each element is its set of free variables.

**Definition 8.** *An* environment *is a sort-respecting finite partial function* E *from* N *into the nominal sets of unnamed* λμ*-terms and continuations. The nominal set of environments has the* <sup>G</sup>*-action:* (<sup>π</sup> · E)(a) = <sup>π</sup> · (E(π−<sup>1</sup> · <sup>a</sup>))*.*

Direct-style CPS evaluation of a program in an environment proceeds as follows:


These transitions are labelled with actions of the form a −→<sup>b</sup> , where <sup>a</sup> is the variable called (if any) and −→b are the fresh variables created (if any). Except for μ-abstraction reduction, each of these evaluation rules decomposes into a complementary pair of input and output rules corresponding to the behaviour of the active (or "positive") part of the program and, a passive (or "negative" part). This decomposition is made precise in Definition 10 (parallel composition for configurations).

**Definition 9.** *The nominal labelled transition system* Compλμ *is defined:*


$$\bigcup\_{x,\alpha \in \mathcal{N}\_{\lambda} \times \mathcal{N}\_{\mu}} \{ \alpha, x \langle \alpha \rangle, \langle \alpha, x \rangle, \langle \alpha \rangle \}$$

**–** *The* transitions *are given in Table 1. By convention, a variable name mentioned on the right of a rule but not the left is assumed not to occur there.*

The *polarity* of a state is positive if the control term is a program or continuation, and negative if it is a value or the empty context (we write V• for a passive term of either kind). Unpolarized transitions send positive states to positive states. Except for μ-abstraction reduction, each corresponds to complementary, positive and negative transitions, which send positive states to negative states and viceversa.

$$\begin{array}{c} (\mathcal{E}[\alpha \mapsto K]; [\alpha]V\_{\bullet}) \stackrel{\alpha}{\longrightarrow} (\mathcal{E}; K[V\_{\bullet}])\\ (\mathcal{E}; K[(\lambda x.s)t]) \stackrel{\langle y.\alpha \rangle}{\longrightarrow} (\mathcal{E}, (y \mapsto t), (\alpha \mapsto K); [\alpha]s[y/x])\\ (\mathcal{E}[x \mapsto t]; K[x]) \stackrel{x \langle \alpha \rangle}{\longrightarrow} (\mathcal{E}, (\alpha \mapsto K); [\alpha]t) \end{array}$$

$$\begin{array}{c} (\mathcal{E}; K[\mu \alpha. M]) \stackrel{\scriptstyle \alpha \mapsto}{\longrightarrow} (\mathcal{E}, (\beta \mapsto K); M[\beta/\alpha])\\ (\mathcal{E}; [\alpha]V\_{\bullet}) \stackrel{\scriptstyle \rightarrow}{\longrightarrow} (\mathcal{E}; V\_{\bullet}) \qquad (\mathcal{E}; [\alpha \mapsto K]; V\_{\bullet}) \stackrel{\scriptstyle \neg \alpha}{\longrightarrow} (\mathcal{E}; K[V\_{\bullet}])\\ (\mathcal{E}; K[\bullet t]) \stackrel{+ (y/\alpha)}{\longrightarrow} (\mathcal{E}, (y \mapsto t), (\alpha \mapsto K); \bullet) \qquad (\mathcal{E}; \lambda x.t) \stackrel{-(y/\alpha)}{\longrightarrow} (\mathcal{E}; [\alpha]t[y/x])\\ (\mathcal{E}; K[x]) \stackrel{+ \scriptstyle x(\alpha)}{\longrightarrow} (\mathcal{E}, (\alpha \mapsto K); \bullet) \qquad (\mathcal{E}[x \mapsto t]; \bullet) \stackrel{-x \ (\alpha)}{\longrightarrow} (\mathcal{E}; [\alpha]t)\\ \text{Table 1: Absretract machine for CPS evaluation of lazy } \lambda \mu\text{-calculus} \end{array}$$

( ; λf.f λx.x) ( ; [α] • λy.y) −g,β <sup>↓</sup> <sup>+</sup>g,β ↓ ( ; [β]g λx.x) ((β -<sup>→</sup> [α]•),(g -<sup>→</sup> λy.y); •) <sup>+</sup>gγ <sup>↓</sup> <sup>−</sup>gγ ↓ ((γ -<sup>→</sup> [β]• λx.x); •) ((β -<sup>→</sup> [α]•),(g -<sup>→</sup> λy.y); [γ]λy.y) <sup>−</sup>γ <sup>↓</sup> <sup>+</sup><sup>γ</sup> ↓ ((γ -<sup>→</sup> [β] • λx.x); [β]• λx.x) ((β -<sup>→</sup> [α]•),(g -<sup>→</sup> λy.y); λy.y) <sup>+</sup>z,δ ↓ −z,δ ↓ (γ -<sup>→</sup> [β]• λx.x),(z -<sup>→</sup> λx.x),(δ -<sup>→</sup> [β]•); •) ((β -<sup>→</sup> [α]•),(g -<sup>→</sup> λy.y); [δ]z) <sup>−</sup>z- <sup>↓</sup> <sup>+</sup>z- ↓ (γ -<sup>→</sup> [β]• λx.x),(z -<sup>→</sup> λx.x),(δ -<sup>→</sup> [β]•); []λx.x) ((β -<sup>→</sup> [α]•),(g -<sup>→</sup> λy.y),( -<sup>→</sup> [δ]•); •) <sup>+</sup> <sup>↓</sup> <sup>−</sup> ↓ (γ -<sup>→</sup> [β]• λx.x),(z -<sup>→</sup> λx.x),(δ -<sup>→</sup> [β]•); λx.x) ((β -<sup>→</sup> [α]•),(g -<sup>→</sup> λy.y),( -<sup>→</sup> [δ]•); [δ]•) <sup>−</sup>δ <sup>↓</sup> <sup>+</sup><sup>δ</sup> ↓ (γ -<sup>→</sup> [β]• λx.x),(z -<sup>→</sup> λxy.x),(δ -<sup>→</sup> [β]•); [β]λx.x) ((β -<sup>→</sup> [α]•),(g -<sup>→</sup> λy.y),( -<sup>→</sup> [δ]•); •) <sup>+</sup>β <sup>↓</sup> <sup>−</sup><sup>β</sup> ↓ (γ -<sup>→</sup> [β]• λx.x),(z -<sup>→</sup> λx.x),(δ -<sup>→</sup> [β]•); λx.x) ((β -<sup>→</sup> [α]•),(g -<sup>→</sup> λy.y),( -<sup>→</sup> [δ]•); [α]•)

Fig. 1: Example traces evaluating [α](λf.f λx.x)λy.y

To define an *interaction structure* on Compλμ (Definition 4) we require a parallel composition operation on configurations.

**Definition 10.** *[Parallel Composition] On control terms, let* | *be the (least) partial operation such that* A|• = •|A = A *and* K|V = V |K = K[V ]*. Given configurations C*<sup>1</sup> = (E1; <sup>A</sup>1) *and C*<sup>2</sup> = (E2; <sup>A</sup>2) *let C*1|*C*<sup>2</sup> - (E<sup>1</sup> ∪ <sup>E</sup>2; <sup>A</sup>1|A2)*, provided* dom(E)∩dom(E) = <sup>∅</sup> *and* <sup>A</sup>1|A<sup>2</sup> *is well-defined. (C*1|*C*<sup>2</sup> *is undefined, otherwise.)*

By inspection of the transitions in Table 1, we may see that *C*1|*C*<sup>2</sup> has precisely the transitions of *C*<sup>1</sup> or *C*<sup>2</sup> (provided any fresh names are fresh for *C*1|*C*2), together with internal transitions arising from communication between *C*<sup>1</sup> and *C*2. Therefore we have an interaction structure according to Definition 4. Figure 1 gives an illustrative example: the evaluation of [α](λf.f λx.x)λy.y — which is the parallel composition (λf.f λx.x)|([α] • λy.y) — to [α]λx.x.

#### **3.1 A Typing System**

We now define a basic typing system for configurations which records minimal information about the control term (whether it is a program, value, continuation or empty context) but captures a more significant property of environments acyclicity. This has practical relevance for memory management, but its immediate significance is that the second order typing in the next section relies on the fact that an acyclic environment may be contracted into a *valuation* by iteratively replacing variables bound in the environment until none occur as free variables.

**Definition 11.** *Given a nominal environment* E*, define the binary relation on* N *:* a <sup>E</sup> b *if* a ∈ sup(E(b)) *and let* <sup>∗</sup> <sup>E</sup> *be its transitive closure. Say that* <sup>E</sup> *is a* pre-valuation *(i.e. acyclic) if this is a strict partial order — i.e.* a <sup>∗</sup> a *for all* a ∈ N *.* E *is a* valuation *if* E=<sup>∗</sup> <sup>E</sup> *— i.e.* sup(E(a)) <sup>∩</sup> dom(E) = <sup>∅</sup> *for all* a ∈ dom(E)*.*

We assume a closure operation which takes an expression e and pre-valuation E to an expression E(e) obtained by replacing each atom a ∈ dom(E) with E(a) in e, having the property that sup(E(e)) ∩ dom(E) = {sup(E(a)) | a ∈ sup(e) ∩ dom(E)}.

**Lemma 1.** *For any pre-valuation* E *there is a unique valuation* E<sup>∗</sup> *such that* E<sup>∗</sup>(E(e)) = E<sup>∗</sup>(e) *for all expressions* e*.*

*Proof.* Defining <sup>E</sup><sup>i</sup> by <sup>E</sup>i+1(a) = <sup>E</sup><sup>i</sup> (E(a)), the E<sup>i</sup> form a chain of pre-evaluations such that the <sup>E</sup> downward closure of {sup(E<sup>i</sup> (a)) ∩ dom(E) | a ∈ dom(E)} is empty or strictly decreasing, and thus is empty for some k — i.e. E<sup>k</sup> is a pre-valuation and thus Ek(E(a)) = E(Ek(a)) = Ek(a) for all a ∈ dom(E), and so E<sup>∗</sup>(E(e)) = E<sup>∗</sup>(e) for all expressions e. If E<sup>∗</sup>(e) = E<sup>∗</sup>(E(e)) for all expressions e, then E<sup>∗</sup>(e) = E<sup>∗</sup>(Ek(e)) = Ek(e) for all e.

**Definition 12.** *The* basic *types for control terms are tuples* Γ τ ; Δ *where* τ ∈ {, ⊥} *and* Γ,Δ *are non-repeating sequences — i.e. totally ordered finite sets — of* λ *and* μ *variables in* N *, respectively.*

*A control term* A *is well-typed with* Γ τ ; Δ *if* F V (A) ⊆ Γ ∪ Δ *and* τ = *if and only if* A *is a value or continuation. Basic types form a nominal set with the evident pointwise* G*-action.*

Configurations are typed with polarized versions of these types. Given a polarized context (non-repeating sequence of polarized variables) <sup>Γ</sup> <sup>=</sup> <sup>p</sup>1x1,...,pnxn we write <sup>|</sup>Γ<sup>|</sup> for the unpolarized context <sup>x</sup>1,...,xn, <sup>Γ</sup> for the polarized context <sup>p</sup>1x1,..., <sup>p</sup>nxn, and <sup>Γ</sup><sup>p</sup> for the (unpolarized) restriction of <sup>Γ</sup> to <sup>p</sup>-polarized elements.

**Definition 13.** *The nominal LTS* Tyλμ *of basic* λμ *configuration types:*


We now define a typing relation from configurations to types. Let Γ be a polarized context. A pre-valuation for <sup>Γ</sup> is a pre-valuation <sup>E</sup> such that <sup>Γ</sup> <sup>+</sup> <sup>⊆</sup> dom(E), sup(E(a)) ⊆ dom(E)∪Γ <sup>−</sup> for every a ∈ dom(E), and if a, b ∈ Γ and a <sup>∗</sup> <sup>E</sup> <sup>b</sup> then a <Γ <sup>b</sup>. Observe that if <sup>E</sup> is a pre-valuation for <sup>Γ</sup>, then <sup>E</sup><sup>∗</sup> is a valuation for <sup>Γ</sup> such that for all <sup>a</sup> <sup>∈</sup> <sup>Γ</sup> <sup>+</sup>, F V (E<sup>∗</sup>(a)) <sup>⊆</sup> <sup>Γ</sup> <sup>−</sup>.

$$\begin{array}{lcl} I \vdash p \top; \Delta & \stackrel{p \langle x, \alpha \rangle}{\longleftrightarrow} I; px \vdash \overline{p} \bot; \Delta, p\alpha \\ I[\overline{p}x] \vdash p \bot; \Delta & \stackrel{px \langle \alpha \rangle}{\longleftrightarrow} I \vdash \overline{p} \bot; \Delta, p\alpha \\ I \vdash p \top; \Delta[\overline{p}\alpha] & \stackrel{p \alpha}{\longleftrightarrow} I \vdash \overline{p} \bot; \Delta \\ I \vdash p \bot; \Delta[\overline{p}\alpha] & \stackrel{p \alpha}{\longleftrightarrow} I \vdash \overline{p} \top; \Delta \end{array}$$

Table 2: Transitions of basic configuration types

**Definition 14 (**λμ **Typing Relation).** (E; A) ⦂ (Γ pτ ; Δ) *if* pol(E; A) = p *and* E *is a pre-valuation for* Γ ∪ Δ *such that* Γ <sup>−</sup> E<sup>∗</sup>(A) : τ ; Δ−*, and for each* <sup>x</sup> <sup>∈</sup> <sup>Γ</sup> <sup>+</sup>*,* <sup>Γ</sup> <sup>−</sup> E<sup>∗</sup>(x) : ; <sup>Δ</sup><sup>−</sup> *and each* <sup>α</sup> <sup>∈</sup> <sup>Δ</sup><sup>+</sup>*,* <sup>Γ</sup> <sup>−</sup> E<sup>∗</sup>(α) : ; <sup>Δ</sup>−*.*

It is straightforward to check that this satisfies the subject reduction properties and thus defines a type system for Compλμ.

*Remark 1.* We may apply a second constraint via our type system: *local control flow* — that continuations are called according to a LIFO discipline and thus may be stored on a stack (in game semantic terms, the *well-bracketing condition*). Evaluation of λ-terms by internal (and positive) transitions naturally satisfies this property — we can use types to ensure that the environment also does so.

**Definition 15.** *A configuration type* Γ pτ ; Δ *satisfies the local control condition if the polarities of* μ*-variables in* Δ *are alternating, and the polarity of the last element of* Δ *(if any) is* p*.*

Transitions for local control types are given by refining the rules for calling a continuation to enforce stack discipline:

$$\begin{array}{c} \varGamma \vdash p \sqsqcap; \varDelta, \overline{p}\alpha \xrightarrow{p\alpha} \varGamma \vdash \overline{p}\bot; \Delta\\ \varGamma \vdash p\bot; \Delta, \overline{p}\alpha \xrightarrow{p\alpha} \varGamma \vdash \overline{p}\top; \Delta \end{array}$$

Subject reduction holds with respect to λ-configurations (in which the control term, and all terms and continuations in the environment, contain no μabstractions).

## **3.2 A Typed Interaction Structure**

We now define an arrow relation, allowing a characterization of parallel composition with hiding for acyclic configurations. (Acyclicity is not preserved by union of environments in general, so the typing rules give a useful way of identifying pairs of configurations for which it does hold.)

**Definition 16.** *The* arrow relation *on configurations* <sup>T</sup>i <sup>=</sup> <sup>Γ</sup>i pτi; <sup>Δ</sup>i *is defined pointwise —* T<sup>1</sup> T2 T<sup>3</sup> *if* Γ<sup>1</sup> Γ2 Γ3*,* Δ<sup>1</sup> Δ<sup>2</sup> Δ3*, and* pτ<sup>1</sup> pτ<sup>2</sup> pτ<sup>3</sup> *— where*

**–** *For any polarized contexts,* Σ<sup>1</sup> Σ2 Σ<sup>3</sup> *if* Σ<sup>1</sup> *and* Σ<sup>3</sup> *have disjoint underlying sets of elements and* Σ<sup>2</sup> *is an interleaving of* Σ<sup>1</sup> *and* Σ3*.*

$$1 - p\tau\_1 \stackrel{p\tau\_2}{\longrightarrow} p\tau\_3 \text{ iff } p\tau\_1 = -\perp \text{ and } p\tau\_2 = p\tau\_3 \text{ or } p\tau\_3 = +\perp \text{ and } p\tau\_2 = \overline{p}\tau\_1.$$

It remains to show that this satisfies Definition 5.

**Proposition 2.** (Tyλμ,) *is a well-defined typing system for* (Compλμ, <sup>|</sup>)*.*

*Proof.* Given *C*<sup>1</sup> = (E1; A1) and *C*<sup>2</sup> = (E2; A2), suppose *C*<sup>1</sup> : T1, *C*<sup>2</sup> : T<sup>2</sup> and sup(*C*1) ∩ sup(*C*2) ⊆ sup(T1) = |Γ1|∪|Δ1|:


Moreover, it is straightforward to verify that the interaction conditions are satisfied and that we therefore have a typed interaction structure.

Thus, by Proposition 1, typed bisimilarity is preserved by parallel composition plus hiding.

**Proposition 3.** *If C*<sup>1</sup> <sup>∼</sup>T<sup>1</sup> <sup>D</sup>1*, C*<sup>2</sup> <sup>∼</sup>T<sup>2</sup> <sup>D</sup><sup>2</sup> *and* <sup>T</sup><sup>1</sup> T2 <sup>T</sup><sup>3</sup> *then C*1|*C*<sup>2</sup> <sup>∼</sup>T<sup>3</sup> *<sup>D</sup>*1|*D*2*.*

It immediately follows that (for example) bisimilarity of values is preserved by placing them inside the same continuation — i.e. if ( ; v) and ( ; v- ) are bisimilar at type Γ −; Δ then ( ; K[v]) and ( ; K[v- ]) are bisimilar at type Γ +⊥; Δ. Moreover, if typed bisimilarity is extended to an equivalence on all λμ-terms <sup>s</sup> <sup>∼</sup>Γ;Δ <sup>t</sup> if ( ; [α]s) <sup>∼</sup><sup>−</sup>Γ +⊥;−Δ,−α ( ; [α]t), for <sup>α</sup> ∈ <sup>Δ</sup> — we may use Proposition 3 to show that if <sup>s</sup> <sup>∼</sup>Γ;Δ <sup>t</sup> then for any compatible context, <sup>C</sup>[t] <sup>∼</sup>Γ;Δ <sup>C</sup>[<sup>t</sup> - ].

## **4 A Polymorphic Type System**

In this section we describe a more restrictive and informative typing system for the interaction structure of λμ configurations. This yields a model of the lazy λμ2-calculus — i.e. lazy λμ-calculus with polymorphic (second-order) Currystyle typing, which we now describe.

In order to fit such a type system to a semantics of lazy evaluation to weak head-normal form, we combine λ-abstraction and application with abstraction and instantiation of finite sequences of type variables — i.e. function types take the form <sup>∀</sup>(X<sup>1</sup> ...Xn).σ <sup>→</sup> <sup>τ</sup> , where <sup>X</sup><sup>1</sup> ...Xn is a finite, non-repeating sequence of type variables. The judgments Θ τ (τ is a well-formed type over the context of type-variables Θ) are derived according to the rules:

$$\begin{array}{cc} \begin{array}{c} \begin{array}{c} \Theta, X\_{1}, \dots, X\_{n} \vdash \sigma \end{array} \quad \begin{array}{c} \Theta, X\_{1}, \dots, X\_{n} \vdash \sigma\\ \Theta \vdash \forall (X\_{1} \dots X\_{n}) . \sigma \to \tau \end{array} \end{array} \end{array} \end{pmatrix}$$

Typing judgments are given with respect to an *equational context* (finite sequence of equations between types). These contexts play a key role in defining states in our LTS of types — they record constraints that type-instantiations must satisfy. For example, if a continuation K (with a hole) of type σ is called with an argument v of type τ then the type variables in σ and τ must have been instantiated so as to make these types equal. Formally, we define the judgment Θ Ξ (Ξ is a well-formed equational context over Θ) as follows:

$$
\begin{array}{ccc}
\overline{\Theta^{\vdash}}\mathsf{L} & \stackrel{\Theta^{\vdash}\Xi}{\to}\neg\neg\sigma\quad\neg\neg\tau\\\end{array}
$$

Type equality judgments with respect to an equational context, of the form Θ; Ξ σ = τ (where Θ Ξ, σ, τ ) are derived according to the rules:

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{c} \begin{array}{c} \begin{array}{c} \end{c} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{c} \begin{array}{c} \begin{array}{c} \end{c} \end{array} \end{array} \end{array} \end{array} \end{array} \end{array} \end{array} \end{array}$$

A valuation <sup>V</sup> for <sup>Θ</sup> *satisfies* an equational context <sup>Θ</sup> <sup>σ</sup><sup>1</sup> <sup>=</sup> <sup>τ</sup>1,...,σn <sup>=</sup> <sup>τ</sup>n if <sup>V</sup>(σi) ≡ V(τi) for each <sup>i</sup> <sup>≤</sup> <sup>n</sup>.

**Lemma 2.** Θ; Ξ σ = τ *if and only if for all valuations* V *which satisfy* Ξ*,* V(σ) ≡ V(τ )*.*

A λμ2 type-in-context is a tuple Θ; Ξ; Γ τ ; Δ, where Θ is a context of type variables and Ξ is an equational context, τ is a λμ2-type (or ⊥) and Γ and Δ are (respectively) sequences of λ-variables and μ-variables and their types (all over Θ). Assigning this type to a term may be understood as asserting that "for any valuation V of the type-variables in Θ which satisfies Ξ, the judgement V(Γ) t : V(τ ); V(Δ) is valid". So, for example, X, Y ; Y = X → X; λx.x : Y ; is derivable according to the rules in Table 3. Note that there are no rules for introducing or discharging equational assumptions — they will be generated by the transitions of the LTS — so the terms of type Θ; ; Γ t : τ ; Δ are precisely those derivable in second-order λμ-calulus without type equality judgments.

Θ;Ξ;Γ[x:τ]x:τ;Δ Θ;Ξ;Γt:σ;Δ Θ;Ξσ=τ Θ;Ξ;Γt:τ;Δ Θ,X1:κ,...,Xn:κn;Ξ;Γ,x:σt:τ;Δ Θ;Ξ;Γλx.t:∀X1...Xn.(σ→τ);<sup>Δ</sup> Θ;Ξ;Γ<sup>t</sup>:∀X1...Xn.σ→τ;Δ Θ<sup>ρ</sup>1,...,ρn <sup>Θ</sup>;Ξ;Γ<sup>s</sup>:σ[ρ1/X1...ρn/Xn];<sup>Δ</sup> Θ;Ξ;Γt s:τ[ρ1/X1...ρn/Xn];<sup>Δ</sup> Θ;Ξ;Γt:τ;Δ[α:τ] Θ;Ξ;Γ-[α]t:⊥;Δ Θ;Ξ;Γ-M:⊥;Δ,α:τ Θ;Ξ;Γμx.M:τ;Δ

Table 3: Typing Judgments for the lazy λμ2-Calculus

## **4.1 Second-Order Configuration Types**

We now define a second-order typing system for the interaction structure Compλμ of λμ configurations. Its states (second-order configuration types) capture the totality of information about the types of the control term and environment, and the instantiations for type variables by both a program and its environment, which may be inferred by an external observer of their interaction.

**Definition 17.** *A* second-order configuration type *is a polarized* λμ2 *type-incontext — a tuple* Θ; Ξ; Γ; pτ ; Δ*, where* Θ *is a polarized context of typevariables, and* Ξ *is a polarized equational context,* Γ *and* Δ *are polarized contexts of typed* λ *and* μ *variables and* pτ *is a polarized* λμ2*-type (or* ⊥*), all over* Θ *.*

We place a further constraint — "polarized satisfiability" — on the configuration types which are permitted as states. This requires that their equational contexts can actually be satisfied by a program and environment successively instantiating type variables quantified positively and negatively (respectively), without knowing the types instantiated by the counterparty.

**Definition 18.** *A pre-valuation* V *for a polarized context of type variables* Θ positively satisfies *the polarized equational context* <sup>Θ</sup> <sup>Ξ</sup> *(written* <sup>V</sup> Θ <sup>Ξ</sup>*) if for any pre-valuation* W *for* Θ*, the first formula in* Ξ not *satisfied by the valuation* (V∪W)<sup>∗</sup> *for* |Θ| *(if any) is negative.* Θ Ξ *is (polarized)* satisfiable *if* Ξ Θ *and* Θ Ξ *are both positively satisfiable. Note that this implies that the underlying context* |Θ||Ξ| *is satisfiable.*

Determining whether a polarized context is satisfiable is equivalent to a series of *conditional (first-order) unification* problems: these can be solved using the algorithm for first-order unification [11]. We place an equivalence relation on configuration types (cf. structural congruence of processes), allowing the principal type to be replaced by any of the (finitely many) types to which it is equivalent under Ξ.

**Definition 19.** (Θ; <sup>Ξ</sup>; <sup>Γ</sup> pτ ; <sup>Δ</sup>) (Θ; <sup>Ξ</sup>; <sup>Γ</sup> pτ- ; Δ) *if* Θ; Ξ τ = τ - *.*

The (bipartite, nominal) LTS Tyλμ<sup>2</sup> of λμ2 is defined:


To define a typing relation between configurations and λμ2-configuration types, we first define typing judgements Θ; Ξ; Γ A : τ ; Δ for control terms. In the case of programs and values, these are as derived according to the rules in Table 3. For continuations, the rules

$$
\begin{array}{cccc}
\begin{array}{c}
\begin{array}{c}
\Theta; \Xi; I \vdash [\alpha] \bullet \tau; \Delta[\alpha :: \tau]
\end{array} \\
\end{array} & \begin{array}{c}
\begin{array}{c}
\Theta; \Xi; I \vdash K : \tau[\rho\_{1}/X\_{1}...\rho\_{n}/X\_{n}]; \Delta \\
\end{array} \\
\hline
\Theta; \Xi; I \vdash K[\bullet s] \forall X\_{1}...X\_{n}. \sigma \rightarrow \tau; \Delta
\end{array}
\end{array}
\end{array}
\begin{array}{c}
\begin{array}{c}
\Theta; \Xi; I \vdash s . \sigma[\rho\_{1}/X\_{1}...\rho\_{n}/X\_{n}] \\
\end{array} \\
\end{array}$$

are equivalent to typing Θ; Ξ; Γ K : τ ; Δ if Θ; Ξ; Γ, • : τ K[•] : ⊥; Δ. The empty context has type ⊥ in any well-formed context.

$$\begin{array}{c} \Theta; \Xi; \Gamma \vdash p \forall X\_{1} \ldots X\_{n}. \sigma \rightarrow \tau; \Delta \stackrel{p\langle x,\alpha\rangle}{\hookrightarrow} \Theta, pX\_{1}, \ldots, pX\_{n}; \Xi; \Gamma, px: \sigma \vdash \overline{p} \bot; \Delta, p\alpha: \tau \vdash \overline{\pi} \\ \Theta; \Xi; \Gamma [\overline{p}x : \tau]; p\bot & \stackrel{p x\langle \alpha\rangle}{\hookrightarrow} \Theta; \Xi; \Gamma \vdash \overline{p} \bot; \Delta, p\alpha: \tau \\ \Theta; Xi; \Gamma \vdash p\bot; \Delta[\overline{p}\alpha: \tau] & \stackrel{p\alpha}{\hookrightarrow} \Theta; \Xi; \Gamma \vdash \overline{p}\tau; \Delta \\ \Theta; \Xi; \Gamma \vdash p\sigma; \Delta[\overline{p}\alpha: \tau] & \stackrel{p\alpha}{\hookrightarrow} \Theta; \Xi, p(\sigma = \tau); \Gamma \vdash \overline{p} \bot; \Delta \end{array}$$

Table 4: Transitions of second-order configuration types

**Definition 20 (Typing Relation).** *Let* V *be a valuation for* Θ *which positively satisfies* <sup>Ξ</sup>*, and define* <sup>V</sup> (E; <sup>A</sup>) <sup>⦂</sup> <sup>Θ</sup>; <sup>Ξ</sup>; <sup>Γ</sup> pτ ; <sup>Δ</sup> *if* <sup>E</sup> *is a pre-valuation for* Γ,Δ*, such that* Θ−; V(Ξ−); V(Γ <sup>−</sup>) E<sup>∗</sup>(A)) : V(τ ); V(Δ−) *and for each* <sup>x</sup> : <sup>σ</sup> <sup>∈</sup> <sup>Γ</sup> <sup>+</sup>*,* <sup>Θ</sup>−; <sup>V</sup>(Ξ−); <sup>V</sup>(<sup>Γ</sup> <sup>−</sup>) E<sup>∗</sup>(x) : <sup>V</sup>(σ); <sup>V</sup>(Δ−) *and each* <sup>α</sup> : <sup>σ</sup> <sup>∈</sup> <sup>Δ</sup><sup>+</sup>*,* Θ−; V(Ξ−); V(Γ <sup>−</sup>) E<sup>∗</sup>(α) : V(σ); V(Δ−)*.*

*Let C* <sup>⦂</sup> <sup>T</sup> *if there exists a valuation* <sup>V</sup> *for* <sup>Θ</sup> *such that* <sup>V</sup> *<sup>C</sup>* <sup>⦂</sup> <sup>T</sup>*.*

Note that if *C* ⦂ T and T T then *C* ⦂ T- , so typing is a well-defined relation from configurations to equivalence classes of configuration types.

#### **Proposition 4.** (Compλμ <sup>⦂</sup> Tyλμ<sup>2</sup>) *satisfies the subject reduction property.*

*Proof.* For the observable transitions, this is a straightforward observation that the typing relation is preserved. For internal transitions (specifically, β reductions), we use the corresponding subject reduction property for λμ2 substitutions — i.e. if Θ; Ξ; Γ K[λx.ts] : ⊥; Δ then Θ; Ξ; Γ K[t[s/x]] : ⊥; Δ and if Θ; Ξ; Γ K[μα.t] : ⊥; Δ then Θ; Ξ; Γ t[K/α] : ⊥; Δ.

Figure 2 gives an example illustrating the role of types in constraining behaviour: a trace of the value λf.f v ⦂ ∃X.X, where v is an arbitrary typable value (recall that <sup>∃</sup>X.X - ∀Y.(∀X.X → Y ) → Y ). Observe that there are no transitions from the the final state — a call to γ is not possible because −Y, +X −(Y - = X- ) is not negatively satisfiable. In fact, the tree of transitions of ∃X.X branches only on negative transitions (i.e. Opponent moves). It follows that any configuration of this type will have the same set of transitions, and that therefore λf.f λxy.x <sup>∼</sup><sup>∃</sup>X.X λf.f λxy.y as proposed in the introduction.

### **4.2 A Second-Order Typed Interaction Structure**

It remains to prove that Tyλμ<sup>2</sup> is a well-defined typing system for the interaction structure on Compλμ, and that typed bisimulation is therefore a congruence. We need to establish that the pointwise extension of the arrow relation (Definition 16) to second-order configuration types (i.e. T<sup>1</sup> T2 T<sup>3</sup> if Θ<sup>1</sup> Θ2 Θ3, Ξ<sup>1</sup> Ξ2 Ξ3, Γ1 Γ2 Γ3, Δ<sup>1</sup> Δ<sup>2</sup> Δ3, and pτ<sup>1</sup> pτ<sup>2</sup> pτ3) satisfies the conditions of Definition 5 — that if *C*<sup>1</sup> = (E1; A1) ⦂ T<sup>1</sup> and *C*<sup>2</sup> = (E2; A2) ⦂ T2, where T<sup>1</sup> T2 T<sup>3</sup> and

( ; λf.f v) <sup>⦂</sup> ( ; ; −(∀X.X <sup>→</sup> <sup>Y</sup> ) <sup>→</sup> <sup>Y</sup> ; ) −g,α ↓ ( ; [α]g v) <sup>⦂</sup> (−Y - ; ; <sup>−</sup>g : <sup>∀</sup>X.X <sup>→</sup> Y - <sup>+</sup>⊥; <sup>−</sup>α : Y - ) <sup>+</sup>gβ ↓ ((β <sup>→</sup> [α] • v); •) <sup>⦂</sup> (−Y - ; ; <sup>−</sup>g : <sup>∀</sup>X.X <sup>→</sup> Y - −⊥; <sup>−</sup>α : Y - , <sup>+</sup>β : <sup>∀</sup>X.X <sup>→</sup> Y - <sup>−</sup>β ↓ ((β <sup>→</sup> [α] • v); [α] • v) <sup>⦂</sup> (−Y - ; ; <sup>−</sup>g : <sup>∀</sup>X.X <sup>→</sup> Y - <sup>+</sup>∀X.X <sup>→</sup> Y - ; <sup>−</sup>α : Y - ) +z,γ ↓ ((β <sup>→</sup> [α] • v), (z <sup>→</sup> v), (γ <sup>→</sup> [α]•); •) <sup>⦂</sup> (−Y - , <sup>+</sup>X- ; ; <sup>−</sup>g : <sup>∀</sup>X.X <sup>→</sup> Y - , <sup>+</sup>z : X- −⊥; <sup>−</sup>α : Y - , <sup>+</sup>γ : Y - ) <sup>−</sup>zδ ↓ ((β <sup>→</sup> [α] • v), (z <sup>→</sup> v), (γ <sup>→</sup> [α]•); [δ]v) <sup>⦂</sup> (−Y - , <sup>+</sup>X- ; ; <sup>−</sup>g : <sup>∀</sup>X.X <sup>→</sup> Y - , <sup>+</sup>z : X- <sup>+</sup>⊥; <sup>−</sup>α : Y - , <sup>+</sup>γ : Y - , <sup>−</sup>δ : X- ) +δ ↓ ((β <sup>→</sup> [α] • v), (z <sup>→</sup> v), (γ <sup>→</sup> [α]•); v) <sup>⦂</sup> (−Y - , <sup>+</sup>X- ; ; <sup>−</sup>g : <sup>∀</sup>X.X <sup>→</sup> Y - , <sup>+</sup>z : X- −X- ; <sup>−</sup>α : Y - , <sup>+</sup>γ : Y - )

Fig. 2: Trace of λf.f v : ∃X.X

sup(*C*1)∩sup(*C*2) ⊆ sup(T1), then C1|C<sup>2</sup> is well-defined, has type T<sup>3</sup> and satisfies the interaction conditions.

By Proposition 2, *C*1|*C*<sup>2</sup> = (E<sup>1</sup> ∪ E2; A1|A2) is a well-defined configuration, and <sup>E</sup> - E<sup>1</sup> ∪ E<sup>2</sup> is a pre-valuation for Γ<sup>3</sup> ∪ Δ3. By the assumption that *C*<sup>1</sup> ⦂ T<sup>1</sup> and *<sup>C</sup>*<sup>2</sup> <sup>⦂</sup>T2, there are valuations <sup>V</sup><sup>1</sup> *<sup>C</sup>*<sup>1</sup> <sup>⦂</sup>T<sup>1</sup> and <sup>V</sup><sup>2</sup> *<sup>C</sup>*<sup>2</sup> <sup>⦂</sup>T2. Then <sup>V</sup> - V1∪V<sup>2</sup> is a pre-valuation for <sup>Θ</sup>3. To show that <sup>V</sup><sup>∗</sup> *<sup>C</sup>*1|*C*<sup>2</sup> <sup>⦂</sup> <sup>T</sup>3, we need to verify that:

**Lemma 3.** V *positively satisfies* Ξ3*.*

*Proof.* Let W be a pre-valuation for Θ3. The first formula in Ξ<sup>2</sup> (if any) which is not satisfied by V∪W = V<sup>1</sup> ∪ V<sup>2</sup> ∪ W cannnot be positive in Ξ<sup>1</sup> (positively satisfied by V1) nor in Ξ<sup>2</sup> (positively satisfied by V2), and so must be a negative formula in Ξ3.

**Lemma 4.** Θ<sup>−</sup> <sup>3</sup> ; V<sup>∗</sup>(Ξ3); V<sup>∗</sup>(Γ <sup>−</sup> <sup>3</sup> ) E<sup>∗</sup>(A1|A2) : V(τ ); V<sup>∗</sup>(Δ<sup>−</sup> 3 )

*Proof.* Observe that E<sup>∗</sup> = (E<sup>∗</sup> <sup>1</sup> · E<sup>∗</sup> <sup>1</sup> )<sup>i</sup> and V = (V<sup>2</sup> · V1)<sup>i</sup> for some i ≤ n. Hence, it suffices to prove by induction on i that Θ2; (V<sup>2</sup> · V1)<sup>i</sup> (Ξ2); (V<sup>2</sup> · V1)<sup>i</sup> (Γ <sup>−</sup> <sup>2</sup> ) (E<sup>∗</sup> <sup>2</sup> · E<sup>∗</sup> <sup>1</sup> )<sup>i</sup> (A1|A2); (V<sup>2</sup> · V1)<sup>i</sup> (Δ<sup>−</sup> <sup>2</sup> ).

Similarly, each term and continuation assigned to an output variable is welltyped under closure by V<sup>∗</sup> and E<sup>∗</sup> and thus:

**Proposition 5.** *C*1|*C*<sup>2</sup> ⦂ T3*.*

It remains to show that the interaction conditions of Definition 5 are satisfied. The key is establishing condition 1 — that if *C*<sup>1</sup> pl −→ *C* - <sup>1</sup> and *C*<sup>2</sup> pl −→ *C* - <sup>2</sup> then T1 pl → T- <sup>1</sup> and T<sup>2</sup> pl → T- <sup>2</sup> such that T- 1 T- 2 T3. This requires some further investigation of configuration types.

The interesting cases are those where A<sup>1</sup> ≡ λx.t and A<sup>2</sup> ≡ K[•s] (or viceversa) and so they can perform the complementary actions − y, α and + y, α. We need to show that |Θ1|; |Ξ1| τ is *non-atomic* — that is, |Θ1|; |Ξ1| τ = <sup>∀</sup>X<sup>1</sup> ...Xm.ρ <sup>→</sup> <sup>σ</sup> — for some ρ, σ. Observe that this implies that <sup>|</sup>Θ2|; <sup>|</sup>Ξ2| <sup>τ</sup> is also non-atomic (since Ξ<sup>2</sup> contains the equations in Ξ1) so that T<sup>1</sup> and T<sup>2</sup> can perform the complementary actions − y, α and + y, α.

Since any derivation of a typing judgement for λx.t or K[•s] must conclude with →-introduction followed by applications of the type-equality rule we have:

**Lemma 5.** *If* Θ; Ξ; Γ λx.t : τ ; Δ *or* Θ; Ξ; Γ K[•s] : τ ; Δ *then* Θ; Ξ τ *is non-atomic.*

Hence, by the assumption that (E1; λx.t)⦂(Θ1; Ξ2; Γ<sup>1</sup> −τ ; Δ1) and (E2; K[•t])⦂ (Θ2; Ξ2; Γ<sup>2</sup> +τ ; Δ) we know that Θ1; V1(Ξ1) V1(τ ) and Θ2; V<sup>∗</sup> <sup>2</sup> (Ξ2) V<sup>∗</sup> <sup>2</sup> (τ ) are non-atomic. From the latter we may infer that Θ1; V<sup>∗</sup> <sup>2</sup> (Ξ1) V<sup>∗</sup> <sup>2</sup> (τ ) is nonatomic, since Θ<sup>2</sup> and Ξ<sup>2</sup> are interleavings of Θ<sup>1</sup> and Ξ<sup>1</sup> with the disjoint contexts Θ<sup>3</sup> and Ξ3.

So to show that |Θ1|; |Ξ1| τ is non-atomic is it is sufficient to prove the contrapositive.

**Lemma 6.** *Suppose* <sup>V</sup><sup>+</sup> <sup>Θ</sup> <sup>Ξ</sup> *and* <sup>V</sup><sup>−</sup> Θ <sup>Ξ</sup>*, where* <sup>|</sup>Θ|; <sup>|</sup>Ξ| <sup>τ</sup> *is atomic. Then either* <sup>Θ</sup>−; <sup>V</sup>+(Ξ) V+(<sup>τ</sup> ) *or* <sup>Θ</sup><sup>+</sup>; <sup>V</sup>−(Ξ) V−(<sup>τ</sup> ) *is atomic.*

*Proof.* We extend the grammar of types with an unbounded set of "neutral atoms" A, B, C, . . ., which are equal only if syntactically identical, and prove the lemma for this extended set of types by an outer induction on the size of Θ, and an inner induction on the sum of the lengths of the types in Ξ.

At least one of V+(τ ) and V−(τ ) must be atomic and so if Ξ is empty then the hypothesis holds. Otherwise, Ξ ≡ p(σ = σ- ), Ξ for some types σ, σ and equational context Ξover Θ, and polarity p ∈ {+, −}.

If σ and σ are both non-atomic, then by satisfiability <sup>σ</sup> ≡ ∀X<sup>1</sup> ...Xn.ρ<sup>1</sup> <sup>→</sup> <sup>ρ</sup><sup>2</sup> and <sup>σ</sup> ≡ ∀X<sup>1</sup> ...Xn.ρ- <sup>1</sup> → ρ- <sup>2</sup> for some ρ1, ρ2, ρ- 1, ρ- <sup>2</sup>. Letting <sup>A</sup>1,...,An be fresh, distinct atomic types, define <sup>ρ</sup> <sup>=</sup> <sup>ρ</sup>[A1/X1,...,An/Xn]. The equational context Ξ-- <sup>=</sup> <sup>p</sup>(ρ<sup>1</sup> <sup>=</sup> <sup>ρ</sup><sup>1</sup> - ), p(ρ<sup>2</sup> <sup>=</sup> <sup>ρ</sup><sup>2</sup> - ), Ξ is equivalent to (satisfied by the same valuations as) Ξ, and so Θ; Ξ-- τ is atomic, and positively and negatively satisfied by V<sup>+</sup> and V−. Hence, by inner induction hypothesis, one of Θ−; V+(Ξ--) V+(<sup>τ</sup> ) or <sup>Θ</sup><sup>+</sup>; <sup>V</sup>−(Ξ--) V−(τ ) is atomic.

Otherwise at least one of σ and σ is atomic. If σ ≡ σ- , then we may discard the tautology σ = σ and apply the (inner) inductive hypothesis to Θ; Ξ- τ . Otherwise at least one of σ, σ must be a type-variable with polarity p in Θ (none of the other cases are p-satisfiable). So assume without loss of generality that Θ ≡ Θ- , pX, Θ- and Ξ ≡ p(σ = X), Ξ- . We may show that:

**–** Θ- , Θ--; Ξ- [σ/X] τ [σ/X] is atomic.

**–** Θ, Θ-- Ξ- [σ/X] is positively satisfied by V<sup>+</sup> and negatively satisfied by V−.

So by the outer inductive hypothesis, either (Θ- , Θ--)−; V+(Ξ[σ/X]) V+(τ ) or (Θ, Θ--)<sup>+</sup>; <sup>V</sup>−(Ξ[σ/X]) V−(<sup>τ</sup> ) is atomic, and hence either <sup>Θ</sup>−; <sup>V</sup>+(Ξ) V+(<sup>τ</sup> ) or <sup>Θ</sup><sup>+</sup>; <sup>V</sup>−(Ξ) V−(<sup>τ</sup> ) is atomic.

We have shown that the arrow relation satisfies the first interaction condition. 2 and 3 are straightforward to verify, establishing that (Compλμ<sup>2</sup> <sup>⦂</sup> Tyλμ2) is a well-defined typed interaction structure. Therefore, by Proposition 1, typed bisimulation is preserved by parallel composition plus hiding, and thus:

**Theorem 1.** *Typed bisimulation is a congruence for the* λμ2*-calculus.*

## **5 Conclusions and Further Directions**

We have described a "Curry-style" approach to game semantics, and used it to give new models of polymorphism. Various existing models may also be framed as typed interaction systems, such as the semantics of call-by-value in [12]. Nor are instances restricted to operational game semantics: for example we can present linear combinatory algebras of games and strategies in this way, and potentially other models of concurrent interaction. Unlike basic Church-style game semantics, these models give the opportunity to make finer distinctions between programs based on internal behaviour, which we have not explored here.

The notion of typed interaction structure reflects only limited structure of our models, but may be developed further. Having characterized parallel composition plus hiding within this setting, a natural next step would be a notion of copycat strategy, leading to structure for sharing and discarding information. One goal for such a development would be to put the generalization of congruence from configurations to terms on a systematic footing.

In another direction, our models of polymorphism may be developed further. In particular combining and fully exploiting generic and abstract data types often requires *higher-order* polymorphism, in which quantifiers range over *type operators* (functions which take types as arguments and return them as values). Whereas this is difficult to represent in game semantics, our model readily extends to a typing system based on System Fω, which allows quantification over type-operators: the price to pay is that satisfiability of configuration types (and thus effective presentation of the states of our LTS) requires the solution of higher-order unification problems, which are undecidable, in general.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **An Axiomatic Approach to Reversible Computation***-*

Ivan Lanese<sup>1</sup> ( ), Iain Phillips<sup>2</sup> , and Irek Ulidowski<sup>3</sup>

<sup>1</sup> Focus Team, University of Bologna/INRIA, Italy ivan.lanese@gmail.com

<sup>2</sup> Imperial College London, England i.phillips@imperial.ac.uk

<sup>3</sup> University of Leicester, England i.ulidowski@leicester.ac.uk

**Abstract.** Undoing computations of a concurrent system is beneficial in many situations, e.g., in reversible debugging of multi-threaded programs and in recovery from errors due to optimistic execution in parallel discrete event simulation. A number of approaches have been proposed for how to reverse formal models of concurrent computation including process calculi such as CCS, languages like Erlang, prime event structures and occurrence nets. However it has not been settled what properties a reversible system should enjoy, nor how the various properties that have been suggested, such as the parabolic lemma and the causal-consistency property, are related. We contribute to a solution to these issues by using a generic labelled transition system equipped with a relation capturing whether transitions are independent to explore the implications between these properties. In particular, we show how they are derivable from a set of axioms. Our intention is that when establishing properties of some formalism it will be easier to verify the axioms rather than proving properties such as the parabolic lemma directly. We also introduce two new notions related to causal consistent reversibility, namely causal safety and causal liveness, and show that they are derivable from our axioms.

**Keywords:** Reversible Computation, Labelled Transition System with Independence, Causal Safety, Causal Liveness

## **1 Introduction**

Reversible computing studies computations which can proceed both in the standard, forward direction, and backward, going back to past states. Reversible computation has attracted interest due to its applications in areas as different as low-power computing [15], simulation [4], robotics [21], biological modelling [31] and debugging [23].

c The Author(s) 2020

<sup>-</sup> This work has been partially supported by COST Action IC1405 on Reversible Computation - Extending Horizons of Computing. The first author has also been partially supported by French ANR project DCore ANR-18-CE25-0007 and by INdAM as a member of GNCS (Gruppo Nazionale per il Calcolo Scientifico).

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 442–461, 2020. https://doi.org/10.1007/978-3-030-45231-5\_23

There is widespread agreement in the literature about what properties characterise reversible computation in the sequential setting. Thus in reversible finite state automata [32], reversible cellular automata [13], reversible Turing machines [2] and reversible programming languages such as Janus [35] the main point is that the mapping from inputs to outputs is injective, and the reverse computation is deterministic.

Matters are less clear when it comes to reversible computation in the concurrent setting. Indeed, various reversible concurrent models have been studied, most notably in the areas of process calculi [6,29,18], event structures [34], Petri nets [1,25] and programming languages such as Erlang [20].

A main result of this line of research is that the notion of reversibility most suited for concurrent systems is *causal-consistent reversibility* (other notions are also used, e.g., to model biological systems [31]). According to an informal account of causal-consistent reversibility, any action can be undone provided that its consequences, if any, are undone beforehand. Following [6] this account is formalised using the notion of causal equivalent traces: two traces are causal equivalent if and only if they only differ for swapping independent actions, and inserting or removing pairs of an action and its reverse. According to [6, Section 3]

Backtracking an event is possible when and only when a causally equivalent trace would have brought this event as the last one

which is then formalised as the so called causal consistency (CC) [6, Theorem 1], stating that coinitial computations are causal equivalent if and only if they are cofinal. Our new proof of CC (Proposition 3.6) shows that it holds in essentially any reversible formalism satisfying the Loop Lemma and the Parabolic Lemma, and we believe that CC is insufficient on its own to capture the informal notion.

A formalisation closer to the informal statement above is provided in [20, Corollary 22], stating that a forward transition t can be undone after a derivation iff all its consequences, if any, are undone beforehand. We are not aware of other discussions trying to formalise such a notion, except for [30], in the setting of reversible event structures. In [30], a reversible event structure is *cause-respecting* if an event cannot be reversed until all events it has caused have also been reversed; it is *causal* if it is cause-respecting and a reversible event can be reversed if all events it has caused have been reversed [30, Definition 3.34].

We provide (Section 4) a novel definition of the idea above, composed by:


We shall see that CC does not capture the same property as CS+CL (Examples 4.15, 4.37), and that there are slightly different versions of CS and CL, which can all be proved under a small set of reasonable assumptions.

The main aim of this paper is to take an abstract model, namely labelled transition systems with independence equipped with reverse transitions (Section 2), and to show that the properties above (as well as others) can be derived


**Table 1.** Axioms and properties for causal reversibility.

from a small set of simple axioms (Sections 3, 4, 5). This is in sharp contrast with the large part of works in the literature, which consider specific frameworks such as CCS [6], CCS with broadcast [26], CCB [14], π-calculus [5], higher-order π [18], Klaim [11], Petri nets [25], μOz [22] and Erlang [20], and all give similar but formally unrelated proofs of the same main results. Such proofs will become instances of our general results. More precisely, our axioms will:


Thus, when defining a new reversible formalism, one just has to check whether the axioms hold, and get for free the proofs of the most relevant properties. Notably, the axioms are normally easier to prove than the properties, hence the assessment of a reversible calculus gets much simpler.

As a reference, Table 1 lists the axioms and properties used in this paper.

In order to understand which kinds of behaviours are incompatible with a causal-consistent reversible setting, consider the following LTSs in CCS:

a.**<sup>0</sup>** <sup>a</sup> <sup>→</sup> **0,** b.**<sup>0</sup>** <sup>b</sup> <sup>→</sup> **0:** from state **<sup>0</sup>** one does not know whether to go back to a.**<sup>0</sup>** or to b.**0**;


We remark that all such behaviours are perfectly reasonable in CCS, and they are dealt with in the reversible setting by adding history information about past actions. For example, in the first case one could remember the initial state, in the second case both the initial state and the action taken, and in the last case the number of iterations that have been performed.

Due to space constraints, some proofs and additional results can only be found in the companion technical report [16].

## **2 Labelled Transition Systems with Independence**

We want to study reversibility in a setting as general as possible. Thus, we base on the core of the notion of *labelled transition system with independence* (LTSI) [33, Definition 3.7]. However, while [33] requires a number of axioms on LTSI, we take the basic definition and explore what can be done by adding or not adding various axioms. Also, we extend LTSI with reverse transitions, since we study reversible systems. We define first labelled transition systems (LTSs).

We consider the LTS of the entire set of processes in a calculus, rather than the transition graph of a particular process and its derivatives, hence we do not fix an initial state.

**Definition 2.1.** *<sup>A</sup>* labelled transition system (LTS) *is a structure* (Proc, Lab,)*, where* Proc *is the set of states (or processes),* Lab *is the set of action labels and* <sup>⊆</sup> Proc <sup>×</sup> Lab <sup>×</sup> Proc *is a* transition relation*.*

We let P, Q, . . . range over processes, a, b, c, . . . range over labels, and t, u, v, . . . range over transitions. We can write t : P <sup>a</sup> <sup>→</sup> Q to denote that t = (P, a, Q). We call a-transition a transition with label a.

**Definition 2.2 (LTS with independence).** *We say that* (Proc, Lab,→, ι) *is an* LTS with independence *(LTSI) if* (Proc, Lab,→) *is an LTS and* ι *is an irreflexive symmetric binary relation on transitions.*

In many cases (see Section 6), the notion of independence coincides with the notion of concurrency. However, this is not always the case. Indeed, concurrency implies that transitions are independent since they happen in different processses, but transitions taken by the same process can be independent as well. Think, for instance, of a reactive process that may react in any order to two events arriving at the same time, and the final result does not depend on the order of reactions.

We shall assume that all transitions are reversible, so that the Loop Lemma [6, Lemma 6] holds. This does not hold in models of reversibility with control mechanisms such as irreversible actions [6,7] or a rollback operator [17]. Nevertheless,

when showing properties of models with controlled reversibility it has proved sensible to first consider the underlying models where all transitions are reversible, and then study how control mechanisms change the picture [11,20]. The present work helps with the first step.

**Definition 2.3.** *Given* (Proc, Lab,)*, let the* reverse LTS *be* (Proc, Lab,-)*, where* P <sup>a</sup> - Q *iff* Q <sup>a</sup> P*. It is convenient to combine the two LTSs (forward and reverse): let the reverse labels be* Lab <sup>=</sup> {<sup>a</sup> : <sup>a</sup> <sup>∈</sup> Lab}*, and define the combined LTS to be* → ⊆ Proc <sup>×</sup> (Lab <sup>∪</sup> Lab) <sup>×</sup> Proc *by* P <sup>a</sup> <sup>→</sup> Q *iff* P <sup>a</sup> Q *and* P <sup>a</sup> <sup>→</sup> Q *iff* P a -Q*.*

We stipulate that the union Lab∪Lab is disjoint. We let α, . . . range over Lab∪Lab. For α <sup>∈</sup> Lab <sup>∪</sup> Lab, the *underlying* action label und(α) is defined as und(a) = <sup>a</sup> and und(a) = a. Let a <sup>=</sup> a for a <sup>∈</sup> Lab. Given t : P <sup>α</sup> <sup>→</sup> Q, let t : Q <sup>α</sup> <sup>→</sup> P be the transition which reverses t.

We let ρ, σ, . . . range over finite sequences <sup>α</sup><sup>1</sup> ...α<sup>n</sup>, with <sup>ε</sup><sup>P</sup> representing the empty sequence starting and ending at P. We shall write ε when P is understood. Given an LTS, a *path* is a sequence of forward or reverse transitions of the form <sup>P</sup><sup>0</sup> <sup>→</sup> α<sup>1</sup> <sup>P</sup><sup>1</sup> ··· <sup>α</sup> <sup>→</sup><sup>n</sup> P<sup>n</sup>. We let r, s, . . . range over paths. We may write r : P <sup>ρ</sup> <sup>→</sup><sup>∗</sup> <sup>Q</sup> where the intermediate states are understood. On occasion we may refer to a path simply by its sequence of labels ρ. Given a path r : P <sup>ρ</sup> <sup>→</sup><sup>∗</sup> <sup>Q</sup>, the inverse path is r : <sup>Q</sup> <sup>ρ</sup> <sup>→</sup><sup>∗</sup> <sup>P</sup> where <sup>ε</sup> <sup>=</sup> <sup>ε</sup> and αρ <sup>=</sup> <sup>ρ</sup> <sup>α</sup>. The length of a path <sup>r</sup> (notated <sup>|</sup>r|) is the number of transitions in the path. Paths r : P <sup>ρ</sup> <sup>→</sup><sup>∗</sup> <sup>Q</sup> and <sup>R</sup> <sup>σ</sup> <sup>→</sup><sup>∗</sup> <sup>S</sup> are *coinitial* if P <sup>=</sup> R and *cofinal* if Q <sup>=</sup> S. We say that a path is *forward-only* if it contains no reverse transitions.

Let (Proc, Lab,→) be an LTS. The irreversible processes in (Proc, Lab,→) are Irr <sup>=</sup> {P <sup>∈</sup> Proc : P -}. A *rooted path* is a path r : P <sup>ρ</sup> <sup>→</sup><sup>∗</sup> <sup>Q</sup> such that <sup>P</sup> <sup>∈</sup> Irr.

In the following we will consider LTSIs obtained by adding a notion of independence to combined LTSs as above. We will call the result a *combined LTSI*.

## **3 Basic Properties**

In this section we show that most of the properties in the reversibility literature (see, e.g., [6,29,18,20]), in particular the parabolic lemma and causal consistency, can be proved under minimal assumptions on the combined LTSI under analysis.

We formalise the minimal assumptions using three axioms, described below.

**Definition 3.1 (Basic axioms).** *Let* <sup>L</sup> = (Proc, Lab,→, ι) *be a combined LTSI. We say* L *satisfies:*

**Square Property (SP)** if whenever t : P <sup>α</sup> <sup>→</sup> <sup>Q</sup>, <sup>u</sup> : <sup>P</sup> <sup>β</sup> → R with tιu then *there are cofinal transitions* u- : Q <sup>β</sup> <sup>→</sup> <sup>S</sup> and <sup>t</sup> - : R <sup>α</sup> → S;

**Backward Transitions are Independent (BTI)** if whenever t : P <sup>a</sup> - Q and t - : P <sup>b</sup> - Q and <sup>t</sup> -= t then tιt- ;

**Well-Foundedness (WF)** *if there is no infinite reverse computation, i.e. we do not have* <sup>P</sup><sup>i</sup> *(not necessarily distinct) such that* <sup>P</sup>i+1 <sup>→</sup> ai <sup>P</sup><sup>i</sup> *for all* <sup>i</sup> = 0, <sup>1</sup>,...*.*

WF can alternatively be formulated using backward transitions, but the current formulation makes sense also in non-reversible calculi (e.g., CCS), which can be used as a comparison. Let us discuss the intuition behind these axioms. SP takes its name from the Square Lemma, where it is proved for concrete calculi and languages in [6,18,20], and captures the idea that independent transitions can be executed in any order, that is they form commuting diamonds. SP can be seen as a sanity check on the chosen notion of independence. BTI generalises the key notion of backward determinism used in sequential reversibility (see, e.g., [32] for finite state automata and [35] for the imperative language Janus) to a concurrent setting. Backward determinism can be spelled as "two coinitial backward transitions do coincide". This can be generalised to "two coinitial backward transitions are independent". Finally, WF means that we consider systems which have a finite past. That is, we consider systems starting from some initial state and then moving forward and back.

Axioms SP and BTI are related to properties which are part of the definition of (occurrence) transition systems with independence in [33, Definitions 3.7, 4.1]. WF was used as an axiom in [28].

Using the minimal assumptions above we can prove relevant results from the literature. We first define causal equivalence, equating computations differing only for swaps of independent transitions and simplification of a transition with its reverse.

**Definition 3.2 (cf. [6]).** *Let* (Proc, Lab,→, ι) *be an LTSI satisfying SP. Let* ≈ *be the smallest equivalence relation on paths closed under composition and satisfying:*


We first consider the Parabolic Lemma ([6, Lemma 10]), which states that each path is causal equivalent to a backward path followed by a forward path.

**Definition 3.3. Parabolic Lemma (PL)***: for any path* r *there are forwardonly paths* s, s *such that* <sup>r</sup> <sup>≈</sup> ss *and* <sup>|</sup>s<sup>|</sup> <sup>+</sup> <sup>|</sup>s |≤|r|*.*

**Proposition 3.4.** *Suppose an LTSI satisfies BTI and SP. Then PL holds.*

The proof of Proposition 3.4 (available in [16]) is very similar to that of [6, Lemma 10] except that in the latter BTI is shown directly as part of the proof.

A corollary of PL is that if a process is reachable from an irreversible process, then it is also forwards reachable from it. In other words, making a system reversible does not introduce new reachable states but only allows one to explore differently forwards reachable states. This is relevant in reversible debugging of concurrent systems [10,20], where one wants to find bugs that actually occur in forward-only computations. See the companion technical report [16, Corollary A.1]. We now move to causal consistency [6, Theorem 1].

## **Definition 3.5. Causal Consistency (CC)***: if* r *and* s *are coinitial and cofinal then* r <sup>≈</sup> s*.*

Essentially, causal consistency states that history information allows one to distinguish computations which are not causal equivalent, indeed, if two computations are cofinal, that is they reach the same final state (which includes the stored history information) then they need to be causal equivalent.

Causal consistency frequently includes the other direction, namely that coinitial causal equivalent computations are cofinal, meaning that there is no way to distinguish causal equivalent computations. This second direction follows easily from the definition of causal equivalence.

Notably, our proof of CC below is very much shorter than existing proofs.

**Proposition 3.6.** *Suppose an LTSI satisfies WF and PL. Then CC holds.*

*Proof.* Let r : P <sup>ρ</sup> <sup>→</sup><sup>∗</sup> <sup>Q</sup> and <sup>r</sup> : <sup>P</sup> <sup>ρ</sup>- <sup>→</sup><sup>∗</sup> <sup>Q</sup>. Using WF, let I,s be such that s : I <sup>σ</sup> <sup>→</sup><sup>∗</sup> <sup>P</sup>, <sup>I</sup> <sup>∈</sup> Irr. Now srsr is a path from <sup>I</sup> to <sup>I</sup>, and so by PL there are <sup>r</sup>1, r<sup>2</sup> forward-only such that <sup>r</sup>1r<sup>2</sup> <sup>≈</sup> srsr . But <sup>I</sup> <sup>∈</sup> Irr and so <sup>r</sup><sup>1</sup> <sup>=</sup> <sup>ε</sup> and <sup>r</sup><sup>2</sup> <sup>=</sup> <sup>ε</sup>. Thus ε <sup>≈</sup> srsr , so that sr <sup>≈</sup> sr and r <sup>≈</sup> r as required. 

Causal consistency implies the unique transition property.

**Definition 3.7.** *An LTSI* (Proc, Lab,→, ι) *satisfies Unique Transition (UT) if* P <sup>a</sup> <sup>→</sup> Q *and* P <sup>b</sup> <sup>→</sup> Q *imply* a <sup>=</sup> b*.*

#### **Corollary 3.8.** *If an LTSI satisfies CC then it satisfies UT.*

UT was shown in the forward-only setting of occurrence TSIs in [33, Corollary 4.4]; it was taken as an axiom in [28].

*Example 3.9 (PL alone does not imply WF or CC).* Consider the LTSI with states <sup>P</sup><sup>i</sup> for <sup>i</sup> = 0, <sup>1</sup>,... and transitions <sup>t</sup><sup>i</sup> : <sup>P</sup><sup>i</sup>+1 a <sup>→</sup> <sup>P</sup><sup>i</sup>, <sup>u</sup><sup>i</sup> : <sup>P</sup><sup>i</sup>+1 b <sup>→</sup> <sup>P</sup><sup>i</sup> with <sup>a</sup> <sup>=</sup> <sup>b</sup> and <sup>t</sup><sup>i</sup> ι u<sup>i</sup>. BTI and SP hold. Hence PL holds by Proposition 3.4. However clearly WF fails. Also <sup>t</sup><sup>i</sup> and <sup>u</sup><sup>i</sup> are coinitial and cofinal, and <sup>a</sup> <sup>=</sup> <sup>b</sup>, so that UT fails, and hence CC fails using Corollary 3.8. Note that the ab diamonds here have the same side states so are degenerate (cf. Lemma 4.4).

## **4 Causal Safety and Causal Liveness**

In the literature, causal consistent reversibility is frequently informally described by saying that "a transition can be undone if and only if each of its consequences, if any, has been undone". In this section we study this property, where the two implications will be referred to as causal safety and causal liveness. We provide three different versions of such properties, based on independence of transitions (Section 4.2), ordering of events (Section 4.3), and independence of events (Section 4.4), and study their relationships. In order to define such properties we need the concept of event.

#### **4.1 Events**

**Definition 4.1 (Event, general definition).** *Let* (Proc, Lab,→, ι) *be an LTSI. Let* <sup>∼</sup> *be the smallest equivalence relation satisfying: if* t : P <sup>α</sup> <sup>→</sup> Q*,* u : P <sup>β</sup> <sup>→</sup> R*,* <sup>u</sup> : Q <sup>β</sup> <sup>→</sup> S*,* t : R <sup>α</sup> <sup>→</sup> S*, and* tιu*,* u ι t *,* t ι u *,* u ι t*, and*


*then* t <sup>∼</sup> t *. The equivalence classes of forward transitions, written* [P, a, Q]*, are the* events*. The equivalence classes of reverse transitions, written* [P, a, Q]*, are the* reverse events*. Define a labelling function from* <sup>→</sup> / <sup>∼</sup> *to* Lab *by setting* ([P, α, Q]) = α*.*

Events are introduced as a derived notion in an LTS with independence in [33], in the context of forward-only computation. We have changed their definition by using coinitial independence at all corners of the diamond, yielding rotational symmetry. This reflects our view that forward and backward transitions have equal status.

Our definition can be simplified if the LTSI, and independence in particular, are well-behaved. Thus, we now add a further axiom related to independence.

**Definition 4.2 (Coinitial Propagation of Independence (CPI)).** *If* t : P α <sup>→</sup> Q*,* u : P <sup>β</sup> <sup>→</sup> R*,* u : Q <sup>β</sup> <sup>→</sup> S *and* t : R <sup>α</sup> <sup>→</sup> S *with* tιu*, then* u ιt *.*

CPI states that independence is a property of commuting diamonds more than of their specific pairs of edges. Indeed, it allows independence to propagate around a commuting diamond.

**Definition 4.3.** *If a combined LTSI satisfies axioms SP, BTI, WF and CPI, we say that it is* pre-reversible*.*

The name 'pre-reversible' indicates that we expect to require further axioms, but the present four are enough to ensure that LTSIs are well-behaved, with events compatible with causal equivalence. Pre-reversible axioms are separated from further axioms by a dashed line in Table 1.

The following non-degeneracy property was shown for occurrence transition systems with independence in [33, page 312], which have forward transitions only. We have to cope with backwards as well as forward transitions.

**Lemma 4.4.** *Suppose that an LTSI is pre-reversible. If we have a diamond* t : P α <sup>→</sup> Q*,* u : P <sup>β</sup> <sup>→</sup> R *with* tιu *together with cofinal transitions* u : Q <sup>β</sup> <sup>→</sup> S *and* t : R <sup>α</sup> <sup>→</sup> S*, then the diamond is* non-degenerate*, meaning that* P, Q, R, S *are distinct states.*

If an LTSI is pre-reversible then by Lemma 4.4 and the use of CPI we can simplify the statement of Definition 4.1 to:

**Definition 4.5 (Event, simplified definition).** *Let* (Proc, Lab,→, ι) *be a prereversible LTSI. Let* <sup>∼</sup> *be the smallest equivalence relation satisfying: if* t : P <sup>α</sup> → Q*,* u : P <sup>β</sup> <sup>→</sup> R*,* u : Q <sup>β</sup> <sup>→</sup> S*,* t : R <sup>α</sup> <sup>→</sup> S*, and* tιu*, then* t <sup>∼</sup> t *.*

We are now able to show independence of diamonds (ID), which can be seen as dual of SP.

**Definition 4.6 (Independence of Diamonds (ID)).** *An LTSI satisfies the Independence of Diamonds property (ID) if whenever we have a diamond* t : P α <sup>→</sup> Q*,* u : P <sup>β</sup> <sup>→</sup> R*,* u : Q <sup>β</sup> <sup>→</sup> S *and* t : R <sup>α</sup> <sup>→</sup> S*, with*

**–** Q <sup>=</sup> R *if* α *and* β *are both forwards or both backwards;* **–** P <sup>=</sup> S *otherwise;*

*then* tιu*.*

**Proposition 4.7.** *If an LTSI satisfies BTI and CPI then it satisfies ID.*

We now consider the interaction between events and causal equivalence. We need some notation first.

**Definition 4.8.** *Let* r *be a path in an LTSI* <sup>L</sup> *and let* e *be an event of* <sup>L</sup>*. Let* (r, e) *be the number of occurrences of transitions* t *in* r *such that* t <sup>∈</sup> e*, minus the number of occurrences of transitions* t *in* r *such that* t <sup>∈</sup> e*.*

We now show that (r, e) is invariant under causal equivalent traces.

**Lemma 4.9.** *Let* <sup>L</sup> *be a pre-reversible LTSI. Let* r <sup>≈</sup> s*. Then for each event* e *we have that* (r, e) = (s, e)*.*

Lemma 4.9 generalises what was shown for the forward-only setting in [33, Corollary 4.3].

**Proposition 4.10.** *If an LTSI is pre-reversible, then for any rooted path* r *and any forward event* e *we have* (r, e) <sup>≥</sup> <sup>0</sup>*.*

#### **4.2 CS and CL via Independence of Transitions**

We first define causal safety and liveness using the independence relation.

**Definition 4.11.** *Let* <sup>L</sup> = (Proc, Lab,→, ι) *be a pre-reversible LTSI.*


**Fig. 1.**

We may wish to close the independence relation over this axiom:

**Definition 4.12 (Independence Respects Events (IRE)).** *Whenever* t <sup>∼</sup> <sup>t</sup> ι u *we have* tιu*.*

IRE is one of the conditions in the definition of transition systems with independence [33, Definition 3.7]. Together with the axioms for pre-reversibility, it is enough to show both causal safety and causal liveness.

**Theorem 4.13.** *Let a pre-reversible LTSI satisfy IRE. Then it satisfies CS.*

#### **Theorem 4.14.** *Let a pre-reversible LTSI satisfy IRE. Then it satisfies CL.*

CS and CL are not derivable from CC; we give an example LTSI which satisfies CC but not CS and not CL.

*Example 4.15.* Consider the LTS in Figure 1. Independence is mostly coinitial and given by closing under BTI and CPI. Additionally we make the leftmost atransition independent with all b-transitions. Note that all a-transitions belong to the same event, and all b-transitions belong to the same event. Also SP and WF hold, so that the LTSI is pre-reversible, and CC holds. However IRE does not hold. Furthermore CS fails using Definition 4.11. Indeed, consider any path bab <sup>→</sup><sup>∗</sup> from the start. CS would imply that the first <sup>b</sup> is independent with the <sup>a</sup> but this is not the case (we do have b ι a).

Also CL fails using Definition 4.11. Indeed, consider any path abb →<sup>∗</sup> from the start. Since the leftmost a-transition is independent with all b-transitions, we should be able to reverse a at the end of the path, but this is not possible.

The next axiom states that independence is fully determined by its restriction to coinitial transitions. This is related to axiom (E) of [33, page 325], but here we allow reverse as well as forward transitions.

*then there are* t <sup>1</sup> <sup>∼</sup> <sup>t</sup><sup>1</sup>*,* <sup>t</sup> <sup>2</sup> <sup>∼</sup> <sup>t</sup><sup>2</sup> *such that* <sup>t</sup> <sup>1</sup> *and* <sup>t</sup> <sup>2</sup> *are coinitial and* <sup>t</sup> <sup>1</sup> ι t 2*.* **Definition 4.16 (Independence of Events is Coinitial (IEC)).** If t<sup>1</sup> ι t<sup>2</sup>

Thanks to previous axioms, independence behaves well w.r.t. reversing.

**Definition 4.17 (Reversing Preserves Independence (RPI)).** If tιt- then t ι t- .

**Proposition 4.18.** *If an LTSI satisfies SP, CPI, IRE, IEC then it also satisfies RPI.*

All the axioms that we have introduced are independent, i.e. none is derivable from the remaining axioms.

**Proposition 4.19.** *SP, BTI, WF, CPI, IRE, IEC are independent of each other.*

## **4.3 CS and CL via Ordering of Events**

To define CS and CL via ordering of events, we define the causality relation ≤ on events.

**Definition 4.20.** *Let* <sup>L</sup> = (Proc, Lab,→, ι) *be an LTSI. Let* e, e *be events of* <sup>L</sup>*. Let* e <sup>≤</sup> e *iff for all rooted paths* <sup>r</sup>*, if* (r, e ) > <sup>0</sup> *then* (r, e) > <sup>0</sup>*. As usual* e<e *means* e <sup>≤</sup> e *and* e <sup>=</sup> e *. If* e<e *we say that* e *is a* cause *of* e *.*

**Lemma 4.21.** *If an LTSI satisfies SP, BTI, WF and CPI then* ≤ *is a partial ordering on events.*

Previously, orderings on events have been defined using forward-only rooted paths; in fact, the definitions coincide for pre-reversible LTSIs.

**Definition 4.22 ([12,28]).** *Let* <sup>L</sup> = (Proc, Lab,→, ι) *be an LTSI. Let* e, e *be events of* <sup>L</sup>*. Let* <sup>e</sup> <sup>≤</sup><sup>f</sup> <sup>e</sup> *iff for all rooted forward-only paths* <sup>r</sup>*, if* <sup>r</sup> *contains a representative of* e *then* <sup>r</sup> *also contains a representative of* <sup>e</sup>*.*

**Lemma 4.23.** *For any LTSI,* <sup>e</sup> <sup>≤</sup> <sup>e</sup> *implies* <sup>e</sup> <sup>≤</sup><sup>f</sup> <sup>e</sup> *. If an LTSI satisfies SP, BTI, WF and CPI then* <sup>e</sup> <sup>≤</sup><sup>f</sup> <sup>e</sup> *implies* <sup>e</sup> <sup>≤</sup> <sup>e</sup> *.*

*Proof.* Straightforward using PL and Lemma 4.9. 

We now give definitions of causal safety and causal liveness using ordering on events.

**Definition 4.24.** *Let* <sup>L</sup> = (Proc, Lab,→, ι) *be an LTSI.*


We postpone giving proofs of CS<sup>&</sup>lt; and CL<sup>&</sup>lt; until we have introduced a further definition of causal safety and liveness using independence of events.

#### **4.4 CS and CL via Independent Events**

We now introduce a third version of causal safety and liveness, which uses independence like CS and CL, but on events rather than on transitions. First we lift independence from transitions to events.

**Definition 4.25 (Coinitially independent events).** *Let events* e, e *be* (coinitially) independent*, written* e ci e *, iff there are coinitial transitions* t, t *such that* [t] = e*,* [t ] = e *and* tιt *.*

**Lemma 4.26.** *If an LTSI is pre-reversible, then if* e ci e *we have also* <sup>e</sup> ci <sup>e</sup> *.*

Thus in pre-reversible LTSIs, ci is fully determined just considering forward events. By Lemma 4.26, if we know e ci e then we know und(e) ci und(e ).

We can give a third formulation of causal safety and liveness using ci:

**Definition 4.27.** *Let* <sup>L</sup> = (Proc, Lab,→, ι) *be a pre-reversible LTSI.*


Note that in Definition 4.27 we operate at the level of events, rather than at the level of transitions as in Definition 4.11.

**Theorem 4.28.** *If an LTSI is pre-reversible then it satisfies CS*ci*.*

We now introduce a weaker version of axiom IRE (Definition 4.12).

**Definition 4.29 (Coinitial IRE (CIRE)).** *If* [t] ci [u] *and* t, u *are coinitial then* tιu*.*

**Theorem 4.30.** *If a pre-reversible LTSI satisfies CIRE then it satisfies CL*ci*.*

We next give an example where CC holds but not CSci (and not CPI).

*Example 4.31.* Consider the cube with transitions a, b, c on the left in Figure 2, where the forward direction is from left to right. We add independence as given by BTI. So SP, BTI, WF hold, but not CPI. From the start we have an atransition followed by a path <sup>r</sup> <sup>=</sup> bc followed by <sup>a</sup>. For CSci to hold, we want a to be the reverse of the same event as the first <sup>a</sup>. They are connected by a ladder with sides cb. We add independence for all corners on the two faces of the ladder (ab and ac). Then we get bc <sup>≈</sup> cb (independence at a single corner is enough). However the bs are not the same event since the bc face does not have independence at each corner. Therefore we do not get [a] ci [b], and CSci fails.

We next give an example where CSci and CLci hold but not CC.

**Fig. 2.**

*Example 4.32.* Consider the LTSI with <sup>Q</sup><sup>i</sup> b <sup>→</sup> <sup>P</sup><sup>i</sup>, <sup>P</sup><sup>i</sup>+1 c <sup>→</sup> <sup>P</sup><sup>i</sup>, <sup>Q</sup><sup>i</sup>+1 c <sup>→</sup> <sup>Q</sup><sup>i</sup>, <sup>P</sup><sup>i</sup>+1 a → <sup>Q</sup><sup>i</sup> for <sup>i</sup> = 0, <sup>1</sup>,.... This is shown on the right in Figure 2. Clearly WF does not hold. We add coinitial independence to make BTI and CPI hold. Then also SP and CIRE hold. However, CC fails since, for example <sup>P</sup><sup>1</sup> a <sup>→</sup> <sup>Q</sup><sup>0</sup> b <sup>→</sup> <sup>P</sup><sup>0</sup> and P1 c <sup>→</sup> <sup>P</sup><sup>0</sup> are coinitial and cofinal but not causally equivalent. Note that there are just three events a, b, c with <sup>a</sup> ci <sup>c</sup>, <sup>b</sup> ci <sup>c</sup> but not <sup>a</sup> ci <sup>b</sup>. CSci and CLci hold. Indeed, c is independent from every other action, and it can always be undone, while a and b are independent from c only and they can be undone after any path composed by c and no others.

### **4.5 Polychotomy**

In this section we relate our three versions of causal safety and liveness, with the help of what we call *polychotomy*, which states that if events do not cause each other and are not in conflict, then they must be independent. We start by defining a conflict relation on events.

**Definition 4.33.** *Two forward events* e, e *are in* conflict*, written* <sup>e</sup> # <sup>e</sup> *, if there is no rooted path* r *such that* (r, e) > <sup>0</sup> *and* (r, e ) > <sup>0</sup>*.*

Much as for orderings, conflict on events has been defined previously using forward-only rooted paths [12,28]; in fact, the definitions coincide for pre-reversible LTSIs. We omit the details.

**Definition 4.34 (Polychotomy).** *Let* L *be a pre-reversible LTSI. We say that* <sup>L</sup> *satisfies* polychotomy *if whenever* e, e *are* forward *events, then exactly one of the following holds: 1.* e <sup>=</sup> e *; 2.* e<e *; 3.* e < e*; 4.* e # e *; or 5.* e ci e *.*

Property NRE below is related to polychotomy.

*any forward event* e *we have* (r, e) <sup>≤</sup> <sup>1</sup>*.* **Definition 4.35 (No Repeated Events (NRE)).** In any rooted path r, for

**Lemma 4.36 (Polychotomy).** *Suppose that a pre-reversible LTSI satisfies NRE. Then polychotomy holds.*

**Fig. 3.**

*Example 4.37.* Consider the LTSI in Figure 3. We add independence to make BTI and CPI hold. Both SP and WF hold. Hence, CC holds as well. There are three events, labelled with a, b, c. Clearly NRE fails for both a and b. We see that a<c but also <sup>a</sup> ci <sup>c</sup>, so that polychotomy fails. CSci holds by Theorem 4.28. However CS<sup>&</sup>lt; fails: consider the transition <sup>P</sup> <sup>a</sup> <sup>→</sup> Q together with the path r : Q bc <sup>→</sup><sup>∗</sup> <sup>R</sup> and <sup>S</sup> <sup>a</sup> <sup>→</sup> R, and note that a<c.

The next lemma allows us to connect ordered safety and liveness with coinitial safety and liveness.

**Lemma 4.38.** *Suppose that a pre-reversible LTSI satisfies NRE. Suppose* P <sup>a</sup> → Q*,* e = [P, a, Q]*,* r : Q <sup>ρ</sup> <sup>→</sup><sup>∗</sup> <sup>R</sup> *and* (r, e ) > <sup>0</sup> *where* e *is a forward event. Then exactly one of* e ci e *and* e<e *holds.*

**Proposition 4.39.** *Suppose that a pre-reversible LTSI* L *satisfies NRE. Then*


Property RED below is also related to NRE and polychotomy.

**Definition 4.40.** *An LTSI satisfies Reverse Event Determinism (RED) if whenever* t, t *are backward coinitial transitions and* t <sup>∼</sup> t *then* t <sup>=</sup> t *.*

**Proposition 4.41.** *If a LTSI* L *is pre-reversible then the following are equivalent: 1.* <sup>L</sup> *satisfies NRE; 2.* <sup>L</sup> *satisfies RED; 3. independence* ci *is irreflexive on events; and 4. polychotomy holds.*

**Proposition 4.42.** *Suppose that a pre-reversible LTSI satisfies CIRE. Then it also satisfies NRE.*

NRE was shown in the forward-only setting of occurrence transition systems with independence in [33, Corollary 4.6]. It was also shown in the reversible setting without independence in [28, Proposition 2.10].

*Example 4.43.* Consider the LTSI in Figure 4. Independence is given by closing under BTI and CPI. There are three events, labelled a, b, c, which are all independent of each other. We see that NRE holds but not CIRE. Also CLci and CL<sup>&</sup>lt; fail: consider <sup>P</sup> <sup>a</sup> <sup>→</sup> Q <sup>b</sup> <sup>→</sup> R, where a cannot be reversed at R.

**Fig. 4.**

**Proposition 4.44.** *Let* L *be a pre-reversible LTSI.*


## **5 Coinitial Independence**

In this section we consider coinitial LTSIs, defined as follows, and their relationship with LTSIs in general.

**Definition 5.1.** *Let* <sup>L</sup> = (Proc, Lab,→, ι) *be a combined LTSI. Then* ι *is* coinitial *if for all transitions* t, u*, if* tιu *then* t *and* u *are coinitial. We say that* <sup>L</sup> *is coinitial if* ι *is coinitial.*

We define a mapping c restricting general independence to coinitial transitions and a mapping g extending independence along events.

**Definition 5.2.** *Given an LTSI* (Proc, Lab,→, ι)*, define* t g(ι) u *iff* t <sup>∼</sup> t ι u <sup>∼</sup> u *for some* t , u *. Furthermore, define* t c(ι) u *iff* tιu *and* t, u *are coinitial.*

**Proposition 5.3.** *Let* <sup>L</sup> = (Proc, Lab,→, ι) *be a pre-reversible LTSI.*


Thanks to Proposition 5.3, we can extend a coinitial pre-reversible LTSI satisfying CIRE in a canonical way to a pre-reversible LTSI satisfying IRE and IEC.

In some reversible calculi (such as RCCS) independence of coinitial transitions is defined purely by reference to the labels. If this is the case it is a simple matter to verify the axioms CPI and CIRE.

**Proposition 5.4.** *Let* <sup>L</sup> = (Proc, Lab,→, ι) *be a coinitial combined LTSI. Suppose that* I *is a binary relation on* Lab *such that for any coinitial transitions* t : P <sup>α</sup> <sup>→</sup> Q *and* u : P <sup>β</sup> <sup>→</sup> R *we have* tιu *iff* I(a, b)*, where* a *and* b *are the underlying labels* a <sup>=</sup> und(α)*,* b <sup>=</sup> und(β)*. Then* <sup>L</sup> *satisfies CPI and CIRE.*

*Proof.* Straightforward, noting that labels on opposite sides of a diamond of transitions must be equal. 

Note that I must be irreflexive, since ι is irreflexive.

If we have a coinitial pre-reversible LTSI satisfying CIRE then CS<sup>&</sup>lt; and CL<sup>&</sup>lt; hold (using Proposition 4.42 and Proposition 4.39). Applying mapping g we get a general pre-reversible LTSI satisfying IRE and IEC by Proposition 5.3. This will satisfy CS and CL as a result of applying Theorem 4.13 and Theorem 4.14 respectively. It will also satisfy CS<sup>&</sup>lt; and CL<. Conversely, if we have a general pre-reversible LTSI satisfying IRE then CS and CL hold by Theorem 4.13 and Theorem 4.14 respectively. Applying mapping c we get a coinitial pre-reversible LTSI satisfying CIRE. This will satisfy CS<sup>&</sup>lt; and CL<sup>&</sup>lt; .

## **6 Case Studies**

We look at whether our axioms hold in various reversible formalisms. Remarkably, all the works below provide proofs of the Loop Lemma.

**RCCS** We consider here the semantics of RCCS in [6], and restrict the attention to coherent processes [6, Definition 2]. In RCCS, transitions <sup>P</sup> <sup>μ</sup>:<sup>ζ</sup> <sup>→</sup> Q and P <sup>μ</sup>- :ζ- → Q are concurrent if μ∩μ <sup>=</sup> <sup>∅</sup> [6, Definition 7]. This allows us to define coinitial independence as tιu iff t and u are concurrent. We now argue that the resulting coinitial LTSI is pre-reversible and also satisfies CIRE. SP was shown in [6, Lemma 8]. BTI was shown in the proof of [6, Lemma 10]. WF is straightforward, noting that backward transitions decrease memory size. Hence, we obtain a very much simplified proof of CC. For CPI and CIRE we note that independence is defined on the underlying labels and thus Proposition 5.4 applies. Therefore CS<sup>&</sup>lt; and CL<sup>&</sup>lt; hold. Using Proposition 5.3, we can get an LTSI with general independence satisfying IRE and IEC, and therefore CS and CL. This is the first time these causal properties have been proved for RCCS.

**HO***<sup>π</sup>* We consider here the uncontrolled reversible semantics for HOπ [18]. We restrict our attention to reachable processes, called there consistent. The semantics is a reduction semantics; hence there are no labels (or, equivalently, all the labels coincide). To have more informative labels we can consider the transitions defined in [18, Section 3.1], where labels are composed of memory information and a flag denoting whether the transition is forward or backward. The notion of independence would be given by the concurrency relation on coinitial transitions [18, Definition 9]. All pre-reversible LTSI axioms hold, as well as CIRE which is needed for causal safety and liveness. Specifically, SP is proved in [18, Lemma 9]. BTI holds since distinct memories have disjoint sets of keys [18, Definition 3 and Lemma 3] and by the definition of concurrency [18, Definition 9]. WF holds as each backward step consumes a memory, which is finite to start with. Finally, CPI and CIRE are valid since the notion of concurrency is defined on the annotated labels and using our Proposition 5.4.

As a result we obtain a very much simplified proof of CC. Moreover, using CPI and CIRE, we get the CS<sup>&</sup>lt; and CL<sup>&</sup>lt; safety and liveness properties and, applying mapping g from Section 5, we get a general pre-reversible LTSI satisfying IRE and IEC, hence CS and CL are satisfied. This is the first time that causal properties have been shown for HOπ.

**<sup>R</sup>***<sup>π</sup>* We consider the (uncontrolled) reversible semantics for π-calculus defined in [5]. We restrict the attention to reachable processes. The semantics is an LTS semantics. Independence is given as concurrency which is defined for consecutive transitions [5, Definition 4.1]. CC holds [5, Theorem 4.5].

Our results are not directly applicable to Rπ, since SP holds up to label equivalence of transitions on opposite sides of the diamond, rather than equality of labels as in our approach. We would need to extend axiom SP and the definition of causal equivalence to allow for label equivalence in order to handle Rπ using our axiomatic method.

**Erlang** We consider the uncontrolled reversible (reduction) semantics for Erlang in [20]. We restrict our attention to reachable processes. In order to have more informative labels we can consider the annotations defined in [20, Section 4.1]. We then can define coinitial transitions to be independent if they are concurrent [20, Definition 12].

We next discuss the validity of our axioms in reversible Erlang. SP is proved in [20, Lemma 13] and BTI is trivial from the definition of concurrency [20, Definition 12]. WF holds since the pairs of integers (total number of elements in memories, total number of messages queued) ordered under lexicographic order are always positive and decrease at each backward step. Intuitively, each step but the ones derived using the rule for reverse sched (see [20, Figure 11]) consumes an item of memory, and each step derived using rule reverse sched removes a message from a process queue. Finally, CPI and CIRE hold since the notion of concurrency is defined on the annotated labels, and by Proposition 5.4.

Since this the setting is very similar to the one of HOπ (both calculi have a reduction semantics and a coinitial notion of independence defined on enriched labels), we get the same results as for HOπ, including CC, and CS and CL.

**Reversible occurrence nets** Reversible occurrence nets [25,24] are traditional occurrence nets (safe and with no backward conflicts) extended with a reverse transition for each forward transition. They give rise to an LTS where states are pairis (N,m) with N a net and m a marking. A computation that represents firing a transition t in (N,m) and resulting in (N,m ) is given by a firing relation (N,m) <sup>t</sup> <sup>→</sup> (N,m ). The notion of independence is the concurrency relation [25, Section 3] which is defined between arbitrary firings (transitions). Hence, we get a general LTSI. The CC property is shown by following the traditional approach in [6]. SP and PL are shown as well. PL and CC require several pages of proofs [24]. The causal safety and causal liveness properties are not considered in [25,24].

We can obtain CC, and additionally CS and CL, as follows. SP and BTI are proved for reversible occurrence nets in [24] as Lemma 4.3 and Lemma 3.3 respectively. WF holds because there are no forward cycles of firings in occurrence nets, hence no infinite reverse paths. In order to have CS and CL, we need to show CPI and IRE. Lemma 3.4 in [24] gives CPI. Events can be defined on firings as in our Definition 4.5, and then IRE holds as the concurrency relation preserves such events.

## **7 Conclusion, Related and Future Work**

The literature on causal-consistent reversibility (see, for example the early survey [19]) has a number of proofs of results such as the parabolic lemma (PL) and the causal consistency property (CC), all of which are instantiated to a specific calculus, language or formalism. We have taken here a complementary approach, analysing the properties of interest in an abstract and language-independent setting. In particular, we have shown how to prove the most relevant of these properties from a small number of axioms.

Our approach builds upon [28], where a set of axioms for reverse LTSs was given and several interesting properties were shown. While the idea is similar, the development is rather different since we consider more basic axioms (we only share WF, while many of the axioms in [28], such as UT, follow from ours), and since the two papers focus on different properties. We focus on CC and various forms of CS and CL, while [28] considers correspondence with prime event structures and reversible bisimulations. Moreover, LTSs in [28] do not have a notion of independence.

In other related work, we may particularly mention [8], which like ours takes an abstract view, though based on category theory. However, its results concern irreversible actions, and do not provide insights in our setting, where all actions are reversible. The only other work which takes a general perspective is [3], which concentrates on how to derive a reversible extension of a given formalism. However, proofs concern a limited number of properties (essentially our CC), and hold only for extensions built using the technique proposed there. Also [27,29] are general, since they propose how to reverse a calculus that can be defined in a general format of SOS rules. However, the format has its syntactic constraints while our approach abstracts from them. Finally, [9] presents a number of properties such as, for example, backward confluence, which arise in the context of reversing of steps of executed transitions in Place/Transition nets.

The approach proposed in this paper opens a number of new possibilities. Firstly, when devising a new reversible formalism, our results provide a rich toolbox to prove (or disprove) relevant properties in a simple way. This is particularly relevant since causal-consistent reversibility is getting applied to more and more complex languages, such as Erlang [20], where direct proofs become cumbersome and error-prone. Secondly, our abstract proofs are relatively easy to formalise in a proof-assistant, which is even more relevant given that this will certify the correctness of the results for many possible instances. Another possible extension of our work concerns integrating into our framework irreversible actions [7]. In order to do that we could take inspiration from the above-mentioned [8].

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **An Auxiliary Logic on Trees: on the Tower-hardness of logics featuring reachability and submodel reasoning**

Alessio Mansutti(-)

LSV, CNRS, ENS Paris-Saclay, Université Paris-Saclay, mansutti@lsv.fr

**Abstract.** We describe a set of simple features that are sufficient in order to make the satisfiability problem of logics interpreted on trees TOWER-hard. We exhibit these features through an *Auxiliary Logic on Trees* (ALT), a modal logic that essentially deals with reachability of a fixed node inside a forest and features modalities from sabotage modal logic to reason on submodels. After showing that ALT admits a TOWER-complete satisfiability problem, we prove that this logic is captured by four other logics that were independently found to be TOWER-complete: twovariables separation logic, quantified computation tree logic, modal logic of heaps and modal separation logic. As a by-product of establishing these connections, we discover strict fragments of these logics that are still non-elementary.

## **1 Introduction**

In mathematical logic there is a well-known trade-off between expressive power and complexity, where weaker languages cannot capture interesting properties of complex systems, whereas finding solutions of a given problem is infeasible for richer languages. For instance, many verification tasks, such as reachability and homomorphisms queries, happen to be expressible in monadic second-order logic (MSO) [15]. This logic is however not usable in practice, as its satisfiability problem SAT(MSO) is undecidable in general and was famously proved by Rabin [36] to be decidable but non-elementary when the logic is interpreted on trees or on one unary function. A more recent analysis that uses the hierarchy of non-elementary ranking functions [38] classifies SAT(MSO) on these two structures as TOWER-complete, i.e. complete for the class of problems of time complexity bounded by a tower of exponentials, whose height is an elementary function of the input.

In order to bypass these problems, a general approach is to design restrictions of MSO that can solve complex reasoning tasks while being more appealing complexity-wise. An example of this is given by the framework of temporal logics, formalisms that describe the evolution of reactive systems [24]. Among the various temporal logics, from the classical linear temporal logic (LTL) [39] and computation tree logic (CTL) [13], as well as their fragments [2,33], to the more recently developed interval temporal logics [7,8], the main common feature of this framework is perhaps the ability to check whether the system can evolve to a certain configuration, i.e. a *reachability* query. In this context, we recall the landmark result on the satisfiability of CTL, shown EXPTIME-complete by Fisher and Ladner [23]. Another possibility to deal with the complexity of MSO is to restrict the second-order quantifications to specific *submodels*. This is the idea behind ambient logic [16], separation logic [37] and more generally bunched logics [35]

c The Author(s) 2020

and graphs logics [1]. These logics provide primitives for reasoning about resource composition, mainly by adding a *spatial conjunction* ∗ which requires to split a model into two disjoint pieces, one satisfying and the other satisfying . Similar ideas are developed in sabotage modal logics, where the formula ⧫ , headed by the *sabotage* modality ⧫, states that must hold in a graph obtained by removing one edge from the current model [4,21]. Within these logics, we highlight the quantifier-free fragment of separation logic restricted to the ∗ operator, denoted here with (∗) and whose satisfiability problem is proved to be PSPACE-complete in [12].

Once a framework provides a solid foundation for reasoning tasks, a natural step is to extend its expressiveness while keeping its complexity in check. Sometimes the additional capabilities do not change the complexity of the logic, as for example (∗) extended with reachability predicates, whose satisfiability problem is still PSPACE-complete [20]. However, it often happens that the new features make the problem jump to higher complexity classes and, sometimes, reach MSO. We pinpoint two instances of this:


Consequently, it is natural to ask ourselves why the additional features made the problem harder. Answering this question requires to study the interplays between the various operators of the logic, searching for a sufficient set of conditions explaining its complexity.

Our motivation. Second-order features often lead to logics with TOWER-hard satisfiability problems, as illustrated above for first-order (∗) and *T*. A good amount of research has been done independently on these logics [5,9,17,28], culminating with the TOWER-hardness of (∗) with two quantified variables [17] and the TOWER-hardness of *<sup>T</sup>* with just one temporal operator between *exists-finally* and *exists-next* [5] (see Section 4 for the definitions). Connections between these two formalisms have not been explicitly developed so far, perhaps because of the quite different logics: *<sup>T</sup>* is built on top of propositional calculus and it is interpreted on infinite trees, whereas (∗) does not feature propositional symbols and it is essentially interpreted on finite structures. Nevertheless, we argue that these and other logics are related not only as they are fragments of MSO, but also as they share a form of reachability and an ability of reasoning on submodels which is sufficient to obtain TOWER-hard logics.

Our contribution. We explicit these common features that lead to TOWER-hard logics by relying on an *Auxiliary Logic on Trees* (ALT), introduced in Section 2. ALT reasons about reachability of a fixed *target node* inside a finite forest and features modalities from sabotage logics to reason on submodels. Here, reachability should be understood as the ability to reach the target node in at least one step, starting from a "current" node which can be updated thanks to the existential modality *somewhere* ⟨U⟩ [26]. In Section 3, we take a look at the expressive power of ALT and show that SAT(ALT) is TOWER-hard. In Section 4, we then display how ALT is captured by first-order (∗) and *T*, as well as modal logic of heaps (MLH) and modal separation logics (MSL), two other logics introduced in [17] and [18], respectively. In this context, beside exposing that all these logics are TOWER-hard because of the way they reason about reachability and submodels, we discover interesting sublogics that are still TOWER-complete:


## **2 The definition of an Auxiliary Logic on Trees**

We introduce an Auxiliary Logic on Trees (ALT). Its formulae are from the grammar:

∶= *<sup>⊤</sup>* <sup>∣</sup> <sup>∧</sup> ∣ ¬ <sup>∣</sup> <sup>∣</sup> <sup>∣</sup> ⟨U⟩ <sup>∣</sup> <sup>⧫</sup> <sup>∣</sup> <sup>⧫</sup>∗

As we will soon clarify, the symbol ⟨U⟩ is borrowed from Goranko and Passy paper on modal logic with universal modality [26]. Similarly, readers who are familiar with sabotage modal logics will recognise in ⧫ the sabotage modality [4], and in ⧫∗its Kleene closure (i.e. ⧫ applied an arbitrary number of times). These two operators modify the model during the evaluation of a formula, making ALT a *relation-changing* modal logic (following the terminology used in [3]). However, contrary to most modal logics, ALT does not feature classical propositional symbols. Instead, this logic only features two interpreted atomic propositions and . Roughly speaking, stands for "the target node is reachable" whereas stands for "the target node is not reachable". The formal definitions will be given soon in order to clarify these two sentences.

Let be countably infinite set of *nodes*. A *(finite) forest* ∶→ is a partial function (encoding the standard parent relation) that


Here, denotes ≥ 0 *functional composition(s)* of . Albeit non-standard, our definition of finite forests over an infinite set of nodes simplifies the forthcoming definitions. Besides, in Section 3.2 we show how restricting to a finite set does not change the expressive power nor the complexity of ALT.

We denote the image of as ran() ={′ ∣ () = ′ for some ∈ dom()}. Given a finite set , we denote with || its cardinality. Let *,* ′ be two nodes. As usual, is a *-descendant* of ′ (alternatively, ′ is an *-ancestor* of ) whenever () = ′ for some ≥ 1. In this case, if = 1 then is a *-child* of ′ (alternatively, ′ is the *-parent* of ). We drop the prefix - from these terms when it is clear from the context. Given two forests *,* ′ , we say that ′ is a *subforest* of , written ′ *⊑* , whenever () = ′ () for every ∈ dom(′ ). Figure 1 intuitively represents two forests (every " " represents a node), the one on the left being a subforest of the one on the right.

ALT is interpreted on *pointed forests* (*, ,* ), where is a forest and *,* ∈ are respectively called the *target node* and the *current evaluation node*. The satisfaction relation *⊧* is defined (throughout the paper, we omit standard clauses for *⊤,* ∧*,* ¬) as:

(*, ,* ) *⊧* ⇔ is a -descendant of . (*, ,* ) *⊧* ⇔ ∈ dom() and (*, ,* ) ̸*⊧* . (*, ,* ) *⊧* ⟨U⟩ ⇔ there is ′ ∈ s.t. (*, ,* ′ ) *⊧* . (*, ,* ) *⊧* ⧫ ⇔ there is ′ s.t. ′ *⊑* , |dom(′ )|+1 = |dom()|, (′ *, ,* ) *⊧* . (*, ,* ) *<sup>⊧</sup>* <sup>⧫</sup>∗ ⇔ there is ′ s.t. ′ *⊑* and (′ *, ,* ) *⊧* .

**Fig. 1.** Subforest relation

We denote with ⟂ the contradiction ¬*⊤*. The standard connectives ∨ and ⇒ are defined as usual. The semantics of and is pretty straightforward. As a visual aid, the nodes in Figure 1 satisfying are the ones in the dark grey area, whereas the ones in the light grey area satisfy . As stated before, the semantics given to ⟨U⟩ is the one of the existential modality *somewhere* [26], stating that there is a way to change the current evaluation node so that becomes true. Its dual operator [U] =¬ ⟨U⟩ ¬ is the universal modality *everywhere*. The semantics given to ⧫ is the one of the *sabotage* modality from [4], which requires to find one edge of the forest that, when removed, makes the model satisfy . Lastly, the <sup>⧫</sup>∗modality, here called *repeated sabotage* operator, can be seen as the operator obtained by applying ⧫ an arbitrary number of times. Indeed, by inductively defining ⧫ ( ∈ ℕ) as the formula for = 0 and otherwise ( ≥ 1) as ⧫ ⧫−1 , it is easy to see that (*, ,* ) *<sup>⊧</sup>* <sup>⧫</sup>∗ is equivalent to <sup>∃</sup> <sup>∈</sup> <sup>ℕ</sup>*.* (*, ,* ) *<sup>⊧</sup>* <sup>⧫</sup> .

Given a pointed forest (*, ,* ), we denote with () the set of its *garbage nodes*: the set of elements in dom() that are not descendants of , i.e. () = { ∈ dom() ∣ <sup>∀</sup> <sup>≥</sup> <sup>1</sup>*,* () <sup>≠</sup> }. Then, () is equivalent to { <sup>∈</sup> ∣ (*, ,* ) *<sup>⊧</sup>* }. We omit the subscript from () when it is clear from the context. We augment the standard precedence rules of propositional logic so that the modalities ⟨U⟩, <sup>⧫</sup> and <sup>⧫</sup><sup>∗</sup> have the same precedence as ¬. For instance, the formula ⟨U⟩ ∧ should be read as (⟨U⟩ ) ∧ .

Satisfiability problem. As usual, given a logic and one of its interpretations *⊧* on a class of structures ℭ, the satisfiability problem of , denoted with SAT() when the interpretation is clear from the context, takes as input a formula of and asks whether there is a structure ∈ ℭ such that *⊧* . If the answer is positive, then is *satisfiable*.

Where does ALT come from? A preliminary definition of ALT was introduced in [31] to reason on the complexity of separation logic. As such, in [31] ALT features the separating conjunction ∗ from separation logic, stating that the forest can be partitioned into two disjoint subforests, one satisfying and one satisfying . This binary operator generalises both ⧫ and ⧫∗operators (we show how in Section 4). Hence, the TOWER-hardness of the satisfiability problem for the logic defined here cannot be inherited from [31] and must be proved (Section 3). Unfortunately, the proof does not give any indication on whether or not the two versions of ALT have the same expressive power. What is clear is that the two logics analyse the model in a different way: the ∗ operator is able to reason on the model in a "concurrent" way, whereas ⧫ and ⧫∗do it in a "sequential" one. Let us draw an example of this. Let (*, ,* ) be a pointed forest. We aim at defining a formula # <sup>≥</sup><sup>2</sup> stating that the target node has at least two children. First, we define # <sup>≥</sup><sup>1</sup> (the formula for just one child) as ⟨U⟩( ∧ ¬ <sup>⧫</sup> ). Intuitively, # <sup>≥</sup> <sup>2</sup>

can then be defined with the <sup>∗</sup> operator simply as the formula # <sup>≥</sup>1∗# <sup>≥</sup>1, stating that it is possible to partition the forest into two subforests having both at least one child of . With the ⧫ operator, this property is instead defined as

$$\#\mathsf{ch}\_{\mathsf{trg}} \ge 2 \overset{\text{def}}{=} \langle \mathsf{U} \rangle \left( \mathbb{T} \land \neg \blacktriangleright \mathsf{G} \land \blacktriangleright (\neg \text{ inDom} \land \#\mathsf{ch}\_{\mathsf{trg}} \ge 1) \right) . \rangle .$$

where =∨ states that the current evaluation node is in the domain of the forest. This definition of # <sup>≥</sup><sup>2</sup> requires to find one child of (as encoded by the "⟨U⟩(∧¬ ⧫ ∧···" part of the formula) and remove it from the model (as expressed by the "⧫(¬ ∧· · ·" part). Only afterwards we check for the existence of a second child of . This form of "sequential reasoning" (that can be often avoided when using the ∗ operator), is used in almost all the formulae of the next sections: we first find a node satisfying a certain property, we remove it from the structure, and only afterwards we check if the model satisfy a second property. This principle only works well for monotonic properties: with respect to the definition of # <sup>≥</sup>2, the set of children of monotonically decreases when considering subforests. Thus, finding a child of in the subforest, implies finding a child of in the original forest.

## **3 On the complexity and expressive power of** ALT

In this section, we show that SAT(ALT) is TOWER-hard by reduction from the satisfiability problem of Propositional Interval Temporal Logic on finite words (Section 3.3). The proof adapts the arguments used in [31] for the version of ALT featuring the separating conjunction ∗. The reduction is somewhat non-intuitive and in [31] it is given without explaining why more direct ways fail. Here, we clarify this issue which is related to the fact that ALT cannot deduce any property of the portion of a pointed forest (*, ,* ) corresponding to the nodes in (), except for the size of () and the query ∈ (). This is done in Section 3.2, by relying on a notion of Ehrenfeucht-Frassé games for ¨ ALT.

## **3.1 Towards the TOWER-hardness of** SAT(ALT)**: how to encode finite words.**

As a first step, we define a correspondence between finite words and specific pointed forests. As usual, we define the set of finite words on a finite alphabet as the closure under Kleene star ∗. To ease our modelling, we suppose = [1*,* ] to be the alphabet of natural numbers between 1 and . Let = 1··· be a -symbols word in <sup>∗</sup> and = {1*,*···*,* } be a set of nodes. Let ( ∈ [1*,* ]) be a set of + 1 nodes different from 1*,*···*,* and so that for each distinct *,*  ∈ [1*,* ], ∩ = ∅. Lastly, let be a node not in <sup>∪</sup> <sup>⋃</sup> ∈[1*,*] . A pointed forest (*, ,* ) encodes w.r.t. the sets *,*1*,*···*,* iff (**I**) () = , (**II**) for all ∈ [1*,*  − 1] () = +1, (**III**) for all ∈ [1*,* ] and ′∈, (′ ) = and (**IV**) every -descendant of belongs to a set among *,*1*,*···*,*.

We call the path from <sup>1</sup> to , the *main path* of . The nodes of this path are the ones in , and can be characterised as being the only descendants of with at least one child. Moreover, <sup>1</sup> is the only node of the main path having the same number of descendants and children. We say that a node ∈ dom() *encodes* the symbol ∈ if it is a descendant of and it has exactly + 1 children that are not in . Then, the nodes in are the only ones encoding symbols, where encodes for any ∈ [1*,* ]. For instance, Figure 2 shows an encoding of the word 1121.

In order to characterise the class of pointed forests encoding finite words, we adapt the formulae of [31] shown in Table 1 (where their semantics is described). Let (*, ,* ) be a pointed forest and let ∈ ℕ. The formula () ≥ is inductively defined as: () ≥ 0 = *⊤,* () ≥ +1 = ⟨U⟩ ( ∧ ⧫(¬ ∧ () ≥ ) ) *.*

Notice how, in the definition of () ≥ +1, we use the same principle used to encode # <sup>≥</sup><sup>2</sup> at the end of Section 2: we first find a node in (), remove it from the model, and then find other elements of (). The formulae # ≥ and # ≥ (again, we refer to Table 1 for their semantics) are instead defined as:

# ≥ <sup>=</sup> <sup>⧫</sup>∗( [U] ¬ *⏟⏟⏟* () is empty. ∧ ∧ ⧫(¬ ∧ () ≥ ) *⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟* Removing lead to a set of garbage nodes of size at least . ) # ≥ 0 = *,* # ≥ +1 = # ≥ +1 ∧ ¬ ⧫ ( ∧¬# ≥ 1) *⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ .*

Whenever nodes of dom() are removed, if still reaches then it has at least one descendant.

Given ∈ {()*,* #*,* #*,* #}, we write <sup>=</sup> for <sup>≥</sup> ∧ ¬ <sup>≥</sup> +1. For instance, # = is the formula that checks whether has *exactly* children and it is a descendant of . We can now conclude the encoding of finite words.

Let (*, ,* ) be a pointed forest encoding ∈ <sup>∗</sup> and let be the set of nodes in its main path. Let us recall two properties of our encoding: (I) a node ′ encodes a symbol of iff ′ ∈ , and (II) the node encoding the first symbol of is the only node in with the same number of descendants and children. To reflect (I) we denote with the formula # <sup>≥</sup> <sup>1</sup>. For (II), given *<sup>⊆</sup>* , we introduce the formula that checks if the current evaluation node corresponds to the first node of the main path and encodes a symbol in . It is defined as <sup>⋁</sup> ∈(# = +1∧# = + 1). The following statement formalises the connection between this formula and property (II) stated above.

**Lemma 1.** *Let* <sup>∈</sup> +*. Let* (*, ,* ) *be a pointed forest encoding . Let* <sup>1</sup> *be the first node in the main path of . For every <sup>⊆</sup> ,* (*, ,* ) *<sup>⊧</sup>* ⟨U⟩ *iff* (*, ,* 1) *<sup>⊧</sup> .*

We are finally ready to define the formula , characterising the class of forests that encodes words in ∗. It is proved correct by Lemma 2, and it is defined as follows

The target node has no descendants, or has a descendant that encodes a symbol.

$$\begin{array}{l} \mathsf{word}\_{\Sigma} \stackrel{\mathsf{def}}{=} \overbrace{(\mathsf{U} \mathsf{J} \,\,\mathsf{T} \Rightarrow \langle \mathsf{U} \rangle \,\mathsf{symb})}^{} \wedge \neg \mathsf{#} \mathsf{ch}\_{\mathsf{trg}} \geq 2 \wedge \\ \mathsf{[U]} (\mathsf{symb} \Rightarrow \mathsf{1st}\_{\Sigma} \vee \underbrace{(\neg \mathsf{1st}\_{\{n+1\}} \wedge \mathsf{\Phi} \,\mathsf{1st}\_{\Sigma})}\_{}) .\end{array}$$

The current node encodes a symbol in [1*,* ] and exactly one of its children encodes a symbol.

**Lemma 2.** *A pointed forest* (*, ,* ) *is an encoding of a word in* <sup>∗</sup> *iff* (*, ,* ) *<sup>⊧</sup> .*

**game played on** ((1*,* 1*,* 1)*,* (2*,* 2*,* 2)*,* (*,,* ))

**if** there is ∈ {*,* } s.t. not ((1*,* 1*,* 1) *⊧*  iff (2*,* 2*,* 2) *⊧* ) **then** the spoiler wins, **otherwise** the spoiler chooses ∈ {1*,* 2} and plays on (*, ,* ). The duplicator replies on (*, ,* ) where ∈ {1*,* 2}⧵{}. The spoiler **must** choose one of the following moves (else the duplicator wins).

⟨U⟩ **move**: if ≥ 1 then the spoiler **can** choose to play a ⟨U⟩ move. It selects a node ′ <sup>∈</sup> .

	- a finite forest ′ such that ′ *<sup>⊑</sup>* and <sup>|</sup>dom(′ )<sup>|</sup> <sup>=</sup> <sup>|</sup>dom()<sup>|</sup> − 1.
	- **–** The duplicator **must** reply with some ′ *<sup>⊑</sup>* s.t. <sup>|</sup>dom(′ )<sup>|</sup> <sup>=</sup> <sup>|</sup>dom()<sup>|</sup> − 1.
	- **–** The game continues on ((′ <sup>1</sup>*,* 1*,* 1)*,* (′ <sup>2</sup>*,* 2*,* 2)*,* (*,*−1*,* )).

⧫∗ **move**: if ≥ 1 then the spoiler **can** choose to play a ⧫∗move. It selects a forest ′ *<sup>⊑</sup>* .


**Fig. 3.** Ehrenfeucht-Fraïssé games for ALT

#### **3.2 Inexpressibility results via the Ehrenfeucht-Fraïssé games for** ALT

Now that we are more familiar with the logic, before completing the TOWER-hardness proof of SAT(ALT) we show some properties that ALT cannot express. Notably, these properties explain why the TOWER-hardness proof of the next section cannot be easily simplified. Moreover, inexpressibility results effectively reduce the set of forests that must be considered in order to solve SAT(ALT). This in turn makes reductions from SAT(ALT) to other logics more immediate, as we show throughout Section 4.

A standard way of proving inexpressibility results for logics interpreted on finite models is by adaptation of the Ehrenfeucht-Fraïssé games [29], as done for other relationchanging logics such as context logic for trees [10] and ambient logic [16].

We define the *rank* of a formula as the triple (*,,* ) ∈ ℕ<sup>3</sup> where the *modal rank* is the greatest nesting depth of the modal operator ⟨U⟩ in , whereas the *sabotage rank* (resp. *repeated sabotage rank* ) is the greatest nesting depth of the ⧫ (resp. ⧫∗) operator in . We denote with ALT() the set of formulae with rank ∈ ℕ3.

The Ehrenfeucht-Fraïssé games (EF-games) for ALT are formally defined in Figure 3. A game is played by two players: the *spoiler* and the *duplicator*. A game state ((1*,* 1*,* 1)*,* (2*,* 2*,* 2)*,*) is a triple made of a rank and two pointed forests(1*,* 1*,* 1) and (2*,* 2*,* 2). The goal of the spoiler is to show that the two structures are different. The goal of the duplicator is to counter the spoiler and show that the two structures are similar. Let us make clear what we mean by two models being different: both players can only play following the rules of the logical formalism (in our case, ALT). Then, two models are different if and only if there is a formula ∈ ALT() that it is satisfied by only one of the two models. This correspondence between the game and the logic is expressed by an adequacy result, formalised below in Lemma 3.

A player has a *winning strategy* if it can play in a way that guarantees it the victory, regardless what the other player does. We write (1*,* 1*,* 1) ≈ (2*,* 2*,* 2) whenever the duplicator has a winning strategy for the game ((1*,* 1*,* 1)*,* (2*,* 2*,* 2)*,*). By Martin's Theorem [32] our games are determined: if the duplicator does not have a winning

strategy then spoiler has one, and vice-versa. Hence, (1*,* 1*,* 1) ̸≈ (2*,* 2*,* 2) refers to the fact that the spoiler has a winning strategy.

**Lemma 3.** (1*,* 1*,* 1) ̸≈(2*,* 2*,* 2) *iff* <sup>∃</sup>∈ALT()*,* (1*,* 1*,* 1)*⊧ and* (2*,* 2*,* 2)̸*⊧.* The left-to-right direction is proved by induction on the rank and by cases on the first move that the spoiler makes in his winning strategy. The other direction is proved by structural induction on . We start to use the EF-games to derive three easy results.

## **Lemma 4.** *Let be a formula.*


*Proof (sketch).* We sketch the proof of (1) to show how EF-games are used. Let us consider a pointed forest (*, ,* ) such that (*, ,* ) *⊧* . We take a node ′ ∉ dom() ∪ ran() and define the forest ′ (′ ) = **if** (′ ) = **then** ′ **else** (′ ). Notice that ′ ∉ dom(′ ). We then prove <sup>∀</sup> <sup>∈</sup> <sup>ℕ</sup><sup>3</sup> (*, ,* ) ≈ (′ *,* ′ *,* ) by induction on , leading to (1) directly by Lemma 3. The proof of (3) essentially follows from (2). *⊓⊔*

Interestingly enough, the third statement of Lemma 4 fundamentally implies that enforcing to be finite, instead of infinite as we do throughout this work, does not change the expressive power nor the complexity of ALT.

Let (*, ,* ) be a pointed forest. We now show that ALT has a very limited expressive power with respect to the garbage nodes. In particular, it can only check for the membership of in () (with the formula ) and for the size of () (with the formula ()≥). We formalise this inexpressibility result as follows.

**Lemma 5.** *Let* = (*,,* )*. Let ,* <sup>1</sup> *and* <sup>2</sup> *be three forests and let ,* <sup>∈</sup> *, such that for every* ∈ {1*,* 2}*, <sup>⊑</sup> and* () = dom() <sup>⧵</sup> dom()*. If we have*

 <sup>∈</sup> 1() *iff* <sup>∈</sup> 2() *and* min(|1()|*,* <sup>+</sup> <sup>+</sup> ) = min(|2()|*,* <sup>+</sup> <sup>+</sup> ) *then* (1*, ,* ) ≈ (2*, ,* )*.*

Let us informally explain Lemma 5, whose proof is by induction on and by cases on the moves of the spoiler. Let (1*, ,* ) be a pointed forest and suppose (ad absurdum) that it satisfies a formula of rank that express a property of the garbage nodes that is different form the ones cited above. For example, let us assume that characterise the set of pointed forests having a garbage node with at least two children. Consider the subforest *<sup>⊑</sup>* <sup>1</sup> whose domain corresponds to the set of 1-descendants of . In particular, 1() = dom(1) <sup>⧵</sup> dom(). We extend to a forest <sup>2</sup> by (re)defining it on the nodes in 1() so that 2() <sup>=</sup> 1() and none of these nodes has more than one 2-child (this construction can always be done). This last equality implies that <sup>∈</sup> 1() ⇔ <sup>∈</sup> 2() and min(|1()|*,* <sup>+</sup> <sup>+</sup> ) = min(|2()|*,* <sup>+</sup> <sup>+</sup> ). By Lemma 5 (1*, ,* ) ≈ (2*, ,* ), which implies (2*, ,* ) *⊧*  by Lemma 3. However, (2*, ,* ) is defined so that every node in 2() has at most one child. Thus, cannot characterise the set of models having a garbage node with at least two children.

As shown in the next section, the inexpressibility result in Lemma 5 plays a central role in the development of the reduction that leads to the TOWER-hardness of SAT(ALT).

## **3.3** PITL **on marked words and the TOWER-hardness of** SAT**(**ALT**)**

We are now ready to show the non-elementarity of SAT(ALT). The proof is by reduction from the satisfiability problem of Propositional Interval Temporal Logic (PITL) under locality principle [34,25], which in turn is shown TOWER-hard by reduction from the nonemptiness problem of star-free regular languages (see [38] for the TOWER characterisation of this problem). PITL is a well-known logic that was introduced by Moszkowski in [34] for the verification of hardware components. It is interpreted on non-empty finite words over a finite alphabet of unary symbols . Its formulae are from the grammar:

$$\varphi \colon= \begin{array}{c} \varphi \land \varphi \mid \ \neg \varphi \mid \ \mathsf{a} \mid \ 1 \mid \ \varphi \mid \varphi \mid \end{array}$$

where ∈ . Under the *locality principle* interpretation, a word = 1··· ∈ <sup>+</sup> satisfies whenever <sup>1</sup> = . Moreover, satisfies if it is a word of length one (i.e. ∈ ). The main feature of this logic is its *chop* operator " ". Intuitively, is satisfied by words that can be "chopped" into a prefix and a suffix sharing one symbol, so that the prefix satisfies and the suffix satisfies . Formally,

$$\{\mathfrak{a}\_1\cdots\mathfrak{a}\_k \models \varphi\}\!\!/\psi\quad\stackrel{\text{at}}{\Leftrightarrow} \text{ there is } i \in \{1, k\} \text{ such that } \mathfrak{a}\_1\cdots\mathfrak{a}\_i \models \varphi \text{ and } \mathfrak{a}\_i\cdots\mathfrak{a}\_k \models \varphi\dots$$

Translating in ALT is not easy. Indeed, given the encoding of words proposed in Section 3.1, chopping in two pieces means splitting in some way the main path 1*,*···*,* of a forest (*, ,* ) encoding to then check that the word encoded by 1*,*···*,* satisfies and the one encoded by *,*···*,* satisfies . However, by doing this the elements 1*,*···*,* become garbage nodes. Thus, as a consequence of Lemma 5, ALT cannot check in any way what is the word encoded by these nodes. Easy translations from PITL to ALT seem therefore impossible and, as done in [31], we are required to go through an alternative interpretation of PITL based on *marking symbols* instead of chopping words.

A *marking* of an alphabet is a bijection (*.*) ∶ → , relating a symbol ∈ to its *marked variant* ∈ . We denote with <sup>⅀</sup> the extended alphabet *⊎* . A word is *marked* if it has some symbols from . We introduce the satisfaction relation *⊧*<sup>∙</sup> on a marked word ∈ <sup>⅀</sup>+. It is defined as usual for Boolean connectives. Moreover,

 *⊧*<sup>∙</sup> ⇔ is headed by or ; *⊧*<sup>∙</sup> ⇔ is headed by a marked symbol. The definition of is more involved. Let ′ ∈ ∗, ∈ and ′′ ∈ <sup>⅀</sup><sup>∗</sup> be such that = ′ ′′, so that is the first marked symbol occurring in (this decomposition is uniquely defined). Then, ′ ′′ *⊧*<sup>∙</sup> holds if and only if there is there is ∈ s.t.


On this semantics, the satisfaction of a formula only depends on the prefix 1···−1 of that ends with the first marked symbol. To check whether *⊧*<sup>∙</sup> we search for a position ∈ [1*,* ] inside this prefix so that is satisfied by the word obtained from by marking the -th symbol, whereas is satisfied by the suffix of starting in . In the definition above, this idea is split into four cases (a)–(d), depending on truthiness of = 1 and = . This is done as it better reflects the encoding of PITL in ALT. The semantics on marked words is related to the standard semantics of PITL as follows.

**Proposition 1 (from [31]).** *Let* ∈ ∗*,* ∈ *and* ′ ∈ <sup>⅀</sup>∗*. Let be a formula in* PITL*. satisfies under the standard interpretation of* PITL *if and only if* ′ *⊧*<sup>∙</sup> *.*

The alternative interpretation of PITL allows us to reduce SAT(PITL) to SAT(ALT) in a neat way. Let = [1*,* ], <sup>⅀</sup> = ∪ and let ∶ <sup>⅀</sup> → [1*,* 2] be the bijection () = 2 for ∈ and () = 2 − 1 for ∈ . (1···) denotes the word (1)· · ·(). maps <sup>⅀</sup> into the alphabet [1*,* 2], whose words can be encoded into trees (as in Section 3.1). In these trees each symbol ∈ (resp. ∈ ) corresponds to a node in the main path having 2 + 1 (resp. 2) children not in this path. Therefore, given a node encoding a symbol in , removing exactly one children of that is not in the main path is equivalent to marking the symbol encodes. Based on this description, we can check if the current evaluation node encodes a marked symbol from with the following formula:

$$\mathsf{mark}\_{\Sigma} \overset{\mathsf{def}}{=} \bigvee\_{\mathsf{a} \in \Sigma} \Big( (\mathsf{\#chi1d} = \mathsf{2a} \land \mathsf{1st}\_{[1,2\mathsf{a}]}) \lor (\mathsf{\#chi1d} = \mathsf{2a} + 1 \land \neg \mathsf{1st}\_{[1,2\mathsf{a}]}) \Big)$$

As already stated, *⊧*<sup>∙</sup> examines the prefix of that ends with the first marked symbol. In pointed forests (*, ,* ) encoding , this prefix corresponds to the subtree whose root encodes a marked symbol and is a -descendant of every other node encoding marked symbols. Therefore, to characterise this tree we need to track the number of nodes encoding marked symbols. We first define a formula <sup>≥</sup> stating that the forest has at least ∈ ℕ nodes encoding marked symbols. It is defined as *⊤* for = 0, and otherwise ( ≥ 1) as ⟨U⟩ ( <sup>∧</sup> <sup>⧫</sup>(¬ <sup>∧</sup> <sup>≥</sup> −1)) . Again, this formula uses the same principle introduced in Section 2 for # <sup>≥</sup>2: we search for a node encoding a marked symbol, remove it from the structure and then search for −1 other such nodes. Similarly, we introduce # <sup>≥</sup> <sup>=</sup> <sup>∧</sup> <sup>⧫</sup>(¬ <sup>∧</sup> <sup>≥</sup> ), the formula stating that the current evaluation node encodes a symbol and has at least ancestors that encode marked symbols.

At last, for a formula in PITL having symbols from = [1*,* ], we introduce its translation <sup>∇</sup> () in ALT, where <sup>≥</sup> <sup>1</sup> tracks the number of nodes encoding marked symbols. It is homomorphic for Boolean connectives: ∇ (¬) =¬∇ () and ∇ (∧) =∇ ()∧∇ (). For ∈ and , it faithfully represent the *⊧*<sup>∙</sup> relation: ∇ () = ⟨U⟩ [2−1*,*2] and ∇ () = ⟨U⟩([1*,*2] ∧ ). Lastly, the formula ∇ ( ) is defined as

$$\begin{split} & \left( \mathsf{T}(\mathsf{U}) \Big( (\mathsf{1s}\mathsf{T}\_{\{1,2\}} \wedge \mathsf{mark}\mathsf{x}\_{\Sigma} \wedge \nabla\_{\beta}(\varphi) \wedge \nabla\_{\beta}(\varphi)) \vee \\ & (\mathsf{1s}\mathsf{t}\_{\{1,2\}} \wedge \mathsf{mark}\mathsf{x}\_{\Sigma} \wedge \mathsf{Φ}(\mathsf{mark}\mathsf{x}\_{\Sigma} \wedge \nabla\_{\beta+1}(\varphi)) \wedge \nabla\_{\beta}(\varphi)) \vee \\ & (\mathsf{-1s}\mathsf{t}\_{\{1,2\}} \wedge \mathsf{mark}\mathsf{x}\_{\Sigma} \wedge \mathsf{mark}\mathsf{x}\_{\Sigma} \geq \beta - 1 \wedge \nabla\_{\beta}(\varphi) \wedge \mathsf{Φ}(\mathsf{1s}\mathsf{t}\_{\{1,2\}} \wedge \nabla\_{\beta}(\varphi))) \vee \\ & (\mathsf{-1s}\mathsf{t}\_{\{1,2\}} \wedge \mathsf{mark}\mathsf{x}\_{\Sigma} \wedge \mathsf{mark}\mathsf{x}\_{\Sigma} \geq \beta \wedge \mathsf{obl}(\mathsf{mark}\_{\Sigma} \wedge \nabla\_{\beta+1}(\varphi)) \wedge \mathsf{@}(\mathsf{1s}\mathsf{t}\_{\{1,2\}} \wedge \nabla\_{\beta}(\varphi)))) \rfloor. \end{split}$$

Notice how ∇ ( ) follows closely the *⊧*<sup>∙</sup> relation: it is split into four disjuncts, one for each case in the definition of . For example, the second disjunct of ∇ ( ) encodes the case (b) in the definition of ′ ′′ *⊧*<sup>∙</sup> , as schematised below:

$$\begin{array}{c|c} \mathsf{P} \mathsf{ITL} & \mathsf{\exists} \mathsf{b} \in \Sigma \dots & \mathsf{\exists} \mathsf{w}\_{2} \in \Sigma^{\*} \text{ s.t. } \mathsf{w}' = \mathsf{b} \mathsf{tw}\_{2} \text{ and } \mathsf{\overline{b}} \mathsf{tw}\_{2} \mathsf{\exists} \mathsf{w}'' \models \varphi & \text{and } \mathsf{b} \mathsf{tw}\_{2} \mathsf{\exists} \mathsf{w}'' \models \varphi\\ \hline \mathsf{A} \mathsf{L} \mathsf{T} & \langle \mathsf{U} \rangle (\mathsf{symb} \dots \mathsf{1} \mathsf{st}\_{\{1,2\mathsf{n}\}} \wedge \neg \mathsf{mark} \mathsf{x}\_{\mathsf{\Sigma}} & \wedge \mathsf{\mathsf{d}} (\mathsf{mark} \mathsf{x}\_{\mathsf{\Sigma}} \wedge \nabla\_{\beta+1} (\varphi)) & \wedge \nabla\_{\beta} (\mathsf{y}) \end{array}$$

The translation is proved correct (by induction on the structure of ) in the next lemma.

**Lemma 6.** *Let* = [1*,* ] *and* <sup>⅀</sup> = ∪ *. Let* ∈ <sup>⅀</sup><sup>+</sup> *with* ≥ 1 *marked symbols. Let* (*, ,* ) *be an encoding of* ()*. For every in* PITL*, <sup>⊧</sup>*<sup>∙</sup> *iff* (*, ,* ) *<sup>⊧</sup>* <sup>∇</sup> ()*.*

Then, the reduction from SAT(PITL) on standard semantics follows as we are able to characterise the set of pointed forests encoding words in ∗ (first three conjuncts in the formula of Lemma 7). To conclude, we simply apply Lemma 6 and Proposition 1.

**Lemma 7.** *Every in* PITL *written with symbols from* = [1*,* ] *is satisfiable under the standard interpretation of* PITL *if and only if the following formula in* ALT *is satisfiable*

> [1*,*2] ∧ ⟨U⟩ ∧ [U]( ⇔ ∧ ¬ ⧫()) ∧ ∇1()*.*

*⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟* The forest encodes a non-empty word. The only node encoding a marked symbol is the child of the target node.

Because of the case distinction in the formula ∇ ( ), the formula obtained via ∇ is exponential (hence elementary) in the number of symbols used to write . Therefore, from the TOWER-hardness of SAT(PITL) we conclude that SAT(ALT) is TOWER-hard.

## **4 Revisiting TOWER-hard logics with** ALT

We now display the usefulness of ALT as a tool for proving the TOWER-hardness of logics interpreted on tree-like structures. In particular, we provide semantically faithful reductions from SAT(ALT) to the satisfiability problem of four logics that were independently found to be TOWER-complete: first-order separation logic [9], quantified CTL on trees [28], modal logic of heaps [17] and modal separation logic [18]. Our reduction only use strict fragments of these formalisms, allowing us to draw some new results on these logics. Most notably, this section shows that all these logics are TOWER-hard because they fundamentally provide the reachability and submodel reasoning given by ALT.

## **4.1 From** ALT **to First-Order Separation Logic**

Separation logic () [37] is an assertion logic used in state-of-the-art tools [6,11] for Hoare-style verification of heap-manipulating programs. As already stated, a preliminary definition of ALT was defined in [31] to reason on the complexity of separation logic. Hence, here we briefly revisit the relation between ALT and .

Let and be two countably infinite sets of program variables and locations, respectively. Separation logic is interpreted on *memory states*: pairs (*, ℎ*) consisting of a function (the *store*) ∶→ and a partial function with finite domain (the *heap*) *<sup>ℎ</sup>*∶→. Since and are both countably infinite sets, w.l.o.g. we assume = . We extend the notation of domain, image and function composition to stores and heaps. Two heaps *ℎ*<sup>1</sup> and *ℎ*<sup>2</sup> are said to be disjoint, written *ℎ*1*⊥ℎ*2, whenever dom(*ℎ*1) ∩ dom(*ℎ*2)=∅, and when this holds the union *ℎ*<sup>1</sup> + *ℎ*<sup>2</sup> of *ℎ*<sup>1</sup> and *ℎ*<sup>2</sup> is defined as the standard sum of functions (*ℎ*<sup>1</sup> + *ℎ*2)() = **if** ∈ dom(*ℎ*1) **then** *ℎ*1() **else** *ℎ*2(). Let ∈ be a *fixed variable* that is reserved for quantification (quantification over other variables is not possible). We consider the separation logic (∗*, ,* ↪+), whose formulae are built from the following grammar (as in [31]):

∶= *⊤* ∣ ∧ ∣ ¬ ∣ ∣ = ∣ ↪ ∣ () ∣ ↪<sup>+</sup> ∣ ∗ ∣ ∃

where *,* ∈ . As shown below, the *reachability predicate* ↪<sup>+</sup> can be seen as the transitive closure of the standard *points-to* predicate ↪ of separation logic. For a memory state (*, ℎ*), the satisfaction relation *⊧* is defined as follows:

(*, ℎ*) *⊧* ⇔ dom(*ℎ*)=∅. (*, ℎ*) *⊧* ↪ ⇔ *ℎ*(()) = (). (*, ℎ*) *⊧* = ⇔ () = (). (*, ℎ*) *⊧* () ⇔ () ∈ dom(*ℎ*). (*, ℎ*) *⊧* ↪<sup>+</sup> ⇔ there is ∃ ≥ 1 such that *ℎ*(()) = (). (*, ℎ*) *⊧*  ∗ ⇔ ∃*ℎ*1*, ℎ*<sup>2</sup> s.t. *ℎ*1*⊥ℎ*2, *ℎ*<sup>1</sup> + *ℎ*<sup>2</sup> = *ℎ*, (*, ℎ*1) *⊧*  and (*, ℎ*2) *⊧* . (*, ℎ*) *⊧* ∃ ⇔ there is a location ′ ∈ such that ([←′ ]*, ℎ*) *⊧* ,

where [←′ ] is the store updated from by only changing the evaluation of from () to ′ , i.e. for every ∈ , [←′ ]() = **if** = (syntactically) **then** ′ **else** (). The main ingredient of separation logic is the *separating conjunction* ∗ , that is satisfied whether *ℎ* can be partitioned into *ℎ*<sup>1</sup> and *ℎ*<sup>2</sup> so that (*, ℎ*1) *⊧*  whereas (*, ℎ*2) *⊧* . The <sup>∗</sup> operator captures the <sup>⧫</sup> and <sup>⧫</sup><sup>∗</sup> operators as follows. Consider the formula = 1 =¬∧ ¬(¬ ∗ ¬), which is satisfied whenever |dom(*ℎ*)| = 1. We define ⧫SL <sup>=</sup> ( = 1) ∗ and <sup>⧫</sup>∗ SL = *⊤* ∗ . The semantics of these formulae is related to the analogous operators of ALT as follows:

$$\begin{aligned} &(s,h) \models \blacktriangleright \spadesuit \varphi \iff \exists h\_1, h\_2 \text{ s.t. } h\_1 \bot h\_2, \, h\_1 + h\_2 = h, \, |\text{dom}(h\_1)| = 1 \text{ and } (s, h\_2) \models \varphi. \\ &(s, h) \models \blacktriangleright \spadesuit \varphi \iff \exists h\_1, h\_2 \text{ s.t. } h\_1 \bot h\_2, \, h\_1 + h\_2 = h \text{ and } (s, h\_2) \models \varphi. \end{aligned}$$

In order to perform the reduction from SAT(ALT) to SAT((∗*, ,* ↪+)), we fix a variable ∈ that is syntactically different from and that plays the role of the target node. Then, the translation () of a formula in ALT is straightforward:

() = ↪<sup>+</sup> . (⧫ ) = ⧫SL (). (*⊤*) = *⊤*. () <sup>=</sup> ()∧¬ (). (⧫∗) <sup>=</sup> <sup>⧫</sup>∗ SL (). (¬) = ¬ (). (⟨U⟩ ) = ∃ (). ( ∧ ) = () ∧ ().

Given a pointed forest (*, ,* ) and a store such that () = and () = , by structural induction on we can easily show that (*, ,* ) *⊧*  <sup>⇔</sup> (*,* ) *⊧* (). This, together with the fact that ∀ ¬(↪<sup>+</sup> ) characterises the class of acyclic heaps (which correspond to the forests of ALT), directly implies the following result.

## **Lemma 8.** *Let* ∈⧵{}*. in* ALT *and* ()∧∀ ¬( ↪<sup>+</sup> ) *are equisatisfiable.*

This lemma reproves that both (∗*, ,* ↪+) and first order separation logic with two quantified variables (denoted as (∗)) admit a TOWER-hard satisfiability problem. (∗), as introduced in [17], can be defined from (∗*, ,* ↪+) by removing and ↪<sup>+</sup> from the syntax and allowing a second variable, different from , to be quantified. However, in [17] the authors show that both and ↪<sup>+</sup> are expressible in (∗), and with some very minor modification to their formulae we can show that both predicates are definable using ⧫ and ⧫∗instead of ∗ and . Moreover, these logics are in TOWER by Rabin's Theorem [36], leading to the TOWER-completeness of SAT(ALT).

**Theorem 1.** SAT((∗)) *and* SAT((∗*, ,* ↪+)) *are* TOWER*-complete even when and* <sup>∗</sup> *are replaced with* <sup>⧫</sup>SL *and* <sup>⧫</sup><sup>∗</sup> SL *.* SAT(ALT) *is* TOWER*-complete.*

#### **4.2 From** ALT **to Quantified Computation Tree Logic**

We now consider Computation Tree Logic (CTL), a well-known logic for branching time model checking [14,13]. Among its extensions, in [5,22,28] the addition of propositional quantification is considered. The satisfiability problem of the resulting logic is undecidable on Kripke structures, and TOWER-complete on trees [28]. In [5], the authors show that the problem is TOWER-hard even when considering just one operator among *exists-next* or *exists-finally* (the definitions are below). Here, we reprove the result for by first tackling the TOWER-hardness of the logic with the *exists-until* ( ), and then show that this operator can be defined using . Differently from [5] and thanks to the properties of ALT, our reduction does not imbricate until operators, showing that this extension of CTL remains TOWER-hard even when ( ) is restricted so that and are Boolean combinations of propositional symbols.

Let us first recall the standard definition of Kripke structure [27]. Let = {*, ,*· · ·} be a countable set of *propositional symbols*. A *Kripke structure* is a triple (*, ,* ) where is a countable set of *worlds*, *⊆* × is a left-total *accessibility relation* (left-total means that for each world ∈ there is ′ ∈ s.t. (*,* ′ ) ∈ ) and ∶ → 2 is a *labelling function*. We define () ={′ ∈ ∣ (*,* ′ ) ∈ } as the set of worlds accessible from ∈ . Let *⊆* × be an arbitrary relation on worlds (not necessarily left-total). A *path* starting in is a sequence of worlds (0*,* 1*,*· · ·) such that <sup>0</sup> <sup>=</sup> and (*,* +1) ∈ for every two successive elements *,* +1 of the sequence. The path is said to be *maximal* whenever it is not a strict prefix of any other path. We denote with Π() the set of *maximal paths* starting in . If is left-total then Π() is the set of all infinite paths starting in . Lastly, ∗() denotes the set of worlds reachable from , i.e. those worlds belonging to a path in Π().

We consider Quantified Computational Tree Logic interpreted under tree semantics (*T*) and refer the reader to [28] for a complete description of the logic. The formulae of *<sup>T</sup>* are built from the following grammar:

$$\varphi \colon = \top \mid \varphi \land \varphi \mid \neg \varphi \mid p \mid \mathsf{EX} \varphi \mid \mathsf{E}(\varphi \mathsf{U} \varphi) \mid \mathsf{A}(\varphi \mathsf{U} \varphi) \mid \exists p \, \varphi$$

where ∈ . All temporal modalities of *<sup>T</sup>* are from CTL: is the *exists-next* modality, ( ) is the *exists-until* modality and ( ) is the *all-until* modality.

*<sup>T</sup>* is interpreted on Kripke trees. Formally, a Kripke structure (*, ,* ) is a *(finitely-branching) Kripke tree* if (I) −1 is functional and acyclic, (II) for every world ∈ , () is finite and (III) it has a *root*, i.e. ∗() = for some ∈ . Given ∈ , the worlds in ∗() ⧵ {} are said to be *descendants* of . As Kripke structures are left-total, Kripke trees can be seen as finitely-branching infinite trees. This leads to SAT(*T*) being in TOWER by reduction to MSO on trees [28]. Let = (*, ,* ) be a Kripke tree and ∈ . The satisfaction relation *⊧* of *<sup>T</sup>* is defined as:

$$\begin{array}{lll} (\mathcal{K}, \mathbf{w}) \models p & \stackrel{\scriptstyle{\text{def}}}{\Leftrightarrow} \mathbf{w} \in \mathcal{V}(p). \\ (\mathcal{K}, \mathbf{w}) \models \mathsf{EX} \, \uprho & \stackrel{\scriptstyle{\text{def}}}{\Leftrightarrow} \exists \mathbf{w}' \in \mathcal{R}(\mathbf{w}) \, \text{s.t.} (\mathcal{K}, \mathbf{w}') \models \rho. \\ (\mathcal{K}, \mathbf{w}) \models \mathsf{E}(\boldsymbol{\varrho} \cup \boldsymbol{\upmu}) & \stackrel{\scriptstyle{\text{def}}}{\Leftrightarrow} \, \text{there are } (\mathsf{w}\_{0}, \mathbf{w}\_{1}, \cdot) \in \Pi\_{R}(\mathbf{w}) \, \text{and } j \in \mathbb{N} \text{ such that} \\ & (\mathcal{K}, \mathbf{w}\_{j}) \models \boldsymbol{\upmu} \, \text{and for every } i < j, (\mathcal{K}, \mathbf{w}\_{i}) \models \rho. \\ (\mathcal{K}, \mathbf{w}) \models \mathsf{A}(\boldsymbol{\varrho} \cup \boldsymbol{\upmu}) & \stackrel{\scriptstyle{\text{def}}}{\Leftrightarrow} \, \text{for all } (\mathsf{w}\_{0}, \mathbf{w}\_{1}, \cdot) \in \Pi\_{R}(\mathbf{w}), \, \exists j \in \mathbb{N} \text{ such that} \\ & (\mathcal{K}, \mathbf{w}\_{j}) \models \boldsymbol{\upmu} \, \text{and for every } i < j, (\mathcal{K}, \mathbf{w}\_{i}) \models \rho. \\ (\mathcal{K}, \mathbf{w}) \models \exists p \, \boldsymbol{\upmu} \, \text{ } \text{there is } \mathcal{W}' \subseteq \mathcal{W} \, \text{ such that } (\mathcal{W}, \mathcal{R}, \mathcal{V}[p \leftarrow \mathcal{W}']) \models \rho. \end{array}$$

where, similarly to the store update [←′ ] of the previous section, [←′ ] stands for the function obtained from by updating the evaluation of from () to ′ .

The formula ∃ requires to update the satisfaction of in a way such that is satisfied. This should already give a good clue on how to reduce ALT to *T*: we represent the nodes of a forest as the set of worlds satisfying a propositional symbol D . Then, for instance, the repeated sabotage operator ⧫∗is encoded by using an existential ∃E that changes the evaluation of a propositional symbol E so that it only holds in worlds where D holds. In this way, the set of worlds satisfying E represents a subforest of the original one. The universal quantification ∀ and the connectives ⇒ and ∨ are defined as usual. So are the classical temporal operators from [14], *exists-finally* = (*⊤* ), *all-generally* = ¬ ¬, *all-finally* = (*⊤* ), *exists-generally* = ¬ ¬, and *exists-strong-release* ( ) = ( ∧ ).

We now work towards a formal encoding of a pointed forest (*, ,* ) into a *pointed model* (*,* ), where = (*, ,* ) is a Kripke tree and is one of its worlds. We use to play the role of the target node . To encode the forest and the current evaluation node we use the worlds appearing in ∗() and three propositional symbols: D , end and n . The intended use of D is to state which elements of ∗() encode nodes in dom(). We need to be careful here, as ∗() is an infinite set whereas dom() is finite. We use the propositional symbol end to solve this inconsistency: we constraint to satisfy the formula (end ) stating that every maximal path (0*,* 1*,*· · ·) ∈ Π() has a finite prefix (0*,*···*,* −1) ( <sup>∈</sup> <sup>ℕ</sup>) of worlds not satisfying end , whereas <sup>∈</sup> (end ). Then, a world in encodes an element in dom() whenever it satisfies D and it belongs to one of these prefixes. We use the propositional symbol n to encode the current evaluation node. During the translation we require n to be satisfied by exactly one descendant of , so that the modality ⟨U⟩ roughly becomes a quantification over n . From [28], checking whether a formula holds in exactly one descendant of can be done with the formula () = ()∧∀ ( ( ∧ ) ⇒ ( ⇒ )) where ∈ does not appear in . For technical reasons, we treat in a similar way the world , which encodes the target node, and require it to be the only world (among the ones in ∗()) satisfying the auxiliary propositional symbol t . Lastly, we use an additional propositional symbol E in order to encode subforests and deal with the encoding of ⧫ and ⧫∗(as stated above).

We now formalise the encoding. For the remaining of this section, we fix a tuple = (end *,* n *,* t ) of three different propositional symbols. Let D be an additional symbol not in , and let (*, ,* ) be a pointed forest s.t. ∉ dom() (by Lemma 4(1) it is sufficient to consider this class of structures in order to decide satisfiability of a formula in ALT). A pointed model ( = (*, ,* )*,* ), is an (*,* D )*-encoding* of (*, ,* ), or simply *encoding* when (*,* D ) is clear from the context, if there is an injection ∶→∗() s.t.

	- **–** <sup>∀</sup> ∈ [0*,*  − 1], <sup>∉</sup> (end ) and ( <sup>∈</sup> (<sup>D</sup> ) <sup>⇔</sup> <sup>∃</sup>′ ∈ dom() (′ ) = );
	- **–** for every ≥ and every node ′ ∈ dom(), (′ ) <sup>≠</sup> .

It is easy to show that such an encoding always exists. Informally, the first property states that encodes and is the only world in ∗() satisfying t . Similarly, the world () encoding is the only world in ∗() that satisfies n . The second property states that the forest must be correctly encoded in the Kripke structure. In particular, notice that the parent relation of the finite forest is inverted so that it becomes the child relation in the

**Fig. 4.** A pointed forest (left) and one of its encoding as a finitely-branching Kripke tree (right).

Kripke structure (as shown in Figure 4). As is an injection, the encoding does not merge together trees that are disconnected in the forest. Lastly, the third property of states that the elements in dom() must be encoded by nodes in ∗() that precede every world satisfying end . Moreover, among all the descendants of preceding end , the worlds encoding dom() are the only ones satisfying D . This implies that does not satisfy D (as ∉ dom()). Figure 4 shows a pointed forest and one of its possible encodings.

We now formalise the translation. Fix two different symbols D *,* E not in . In order to alternate between D and E , we define D = E and E <sup>=</sup> <sup>D</sup> . The translation u() of a formula in ALT, implicitly parametrised by and where u ∈ {D *,* E }, is homomorphic for *⊤* and Boolean connectives (as in , see Section 4.1), and otherwise it is defined as

$$\begin{array}{ll} \tau\_{\mathsf{u}}(\mathsf{T}) & \stackrel{\scriptstyle \mathsf{id}}{=} \mathsf{E}((\mathsf{u}\vee t\,)\wedge\neg end)\,\mathsf{M}\,(\mathsf{u}\wedge n\,)). & \tau\_{\mathsf{u}}(\mathsf{U}\,\mathsf{U}\,\varphi) \stackrel{\scriptstyle \mathsf{id}}{=} \exists\mathsf{n}\,\,(\mathsf{un}\,\mathsf{iq}(n)\wedge\tau\_{\mathsf{u}}(\mathsf{op})).\\ \tau\_{\mathsf{u}}(\mathsf{G}) & \stackrel{\scriptstyle \mathsf{id}}{=} \mathsf{E}(\negend\,\,\mathsf{M}\,(\mathsf{u}\wedge n\,))\wedge\neg\tau\_{\mathsf{u}}(\mathsf{T}). & \tau\_{\mathsf{u}}(\mathsf{\mathsf{A}}^{\&}\,\varphi) \stackrel{\scriptstyle \mathsf{id}}{=} \exists\mathsf{\overline{u}}\,(\mathsf{AG}\,(\overline{\mathsf{u}}\Rightarrow\mathsf{u})\wedge\tau\_{\mathsf{u}}(\mathsf{op})).\\ \tau\_{\mathsf{u}}(\mathsf{\mathsf{A}}\,\varphi) \stackrel{\scriptstyle \mathsf{id}}{=} \exists\mathsf{\overline{u}}\,(\mathsf{AG}\,(\overline{\mathsf{u}}\Rightarrow\mathsf{u})\wedge\operatorname{un\mathsf{im}}\,\mathsf{q}(\mathsf{u}\wedge\neg\overline{\mathsf{u}})\wedge\operatorname{\mathsf{E}}(\negend\,\,\mathsf{M}\,(\mathsf{u}\wedge\neg\overline{\mathsf{u}}))\wedge\tau\_{\mathsf{u}}(\mathsf{op}). \end{array}$$

Let (*, ,* ) be a pointed forest s.t. ∉ dom() and let ((*, ,* )*,* ) be one of its (*,* <sup>u</sup>)-encodings w.r.t. the injection . For instance, u() requires that there is a path (*,* 1*,*···*,* ) starting in () = and whose worlds do not satisfy end and must satisfy <sup>u</sup> or <sup>t</sup> . Moreover, the last world must satisfy <sup>u</sup> and <sup>n</sup> . From property (1) of the definition of , the only element satisfying t is , which does not satisfy u (as ∉ dom()). Then, this path of worlds encodes a path in the pointed forest, from the current evaluation node (which is encoded by the only world satisfying n ) to the target node . The translation is shown correct (by structural induction on ) for pointed forests that admit an encoding.

**Lemma 9.** *Let* (*, ,* ) *be a pointed forest s.t.* ∉ dom()*, and let* (*,* ) *be a* (*,* u) *encoding of* (*, ,* )*. Given a formula in* ALT*,* (*, ,* ) *⊧ if and only if* (*,* ) *⊧* u()*.*

Then, to conclude the reduction we just need to characterise the set of models encoding a pointed forest. The formula =¬D ∧t ∧ (t ) ∧ (n ) ∧(end ) does the job.

**Lemma 10.** *in* ALT *and* ∧ D() *in <sup>T</sup> are equisatisfiable.*

We now take a closer look to the translation. Given a temporal modality and ∈ ℕ∪{}, *T*( ) denotes the fragment of *<sup>T</sup>* restricted to formulae where the only temporal modality allowed is , which can be nested at most times ( stands for an arbitrary number of imbrications). For instance, *T*( ) denotes the set of formulae restricted to the operator , which can be nested at most times. This fragment of *<sup>T</sup>* is shown to be -NEXPTIME-hard in [5], which directly leads to the TOWER-hardness of *T*( ) and *T*(). By analysing our translation it is easy to show that *T*(0), i.e. *<sup>T</sup>* restricted to the only modality ( ) where and are Boolean combination of propositional symbols, and *T*(1) are already TOWERhard. First of all, the formula ( ) in *T*(0) is equivalent to the following formula in *T*(1): ∃ ( (¬∧¬ ⇒ )∧( ⇒ )∧ ( ∧¬) ) , where does not appear in or . Then, we just need to prove the result for *T*(0).

Clearly, the translation <sup>u</sup> is defined so that the resulting formula is in *T*(0). However, we need to deal with the occurrence of (end ) used inside the formula . Let us first consider the formula ( ⇒ ) which is satisfied by models where once is found to hold in a certain world , then is satisfied in every world of ∗(). Despite not being in *T*(0), the formula ( ⇒ ) is equivalent to the following formula: ∀∀ ( () ∧() ∧ (∧) ∧ ( ∧ ¬) ⇒ (¬ ) ) , where and do not appear in or . We then define a formula () that only uses modalities and is equivalent to , so that then ¬ (¬) is equivalent to :

$$\begin{aligned} \chi\_{\mathsf{EG}}(\mathsf{q}) \stackrel{\text{def}}{=} & \exists p \Big( \neg p \land \mathsf{AG} \,(\neg q \Rightarrow p) \land \mathsf{AG} \,(p \Rightarrow \mathsf{AG} \, p) \land \\ & \forall q \Big( \text{uni} \, \mathsf{q}(q) \land \mathsf{EF} \,(q \land \neg p) \Rightarrow \mathsf{EF} \,(q \land \mathsf{EF} \,(\neg q \land \neg p)) \Big) \Big) \Big) \end{aligned}$$

where does not appear in . This formula is expressible in *T*(0), as every subformula that is not in this fragment is an instance of ( ⇒ ). Then, we conclude that (end ) is expressible in *T*(0), leading to the following result.

**Theorem 2.** *The satisfiability problems of T*(0) *and T*(1) *are* TOWER*-c.*

## **4.3 From** ALT **to Modal Logic of Heaps and Modal Separation Logic**

In [17] and later in [18] two families of logics are presented, respectively called *modal logic of heaps* (MLH) and *modal separation logic* (MSL). At their core, both logics can be seen as modal logics extended with separating connectives, hence mixing separation logic (Section 4.1) with temporal aspects as in quantified CTL (Section 4.2). As we already shown how ALT is captured by these two latter logics, it is natural to ask ourselves if the same holds for MLH and MSL. In this section, we show that this is indeed the case and, as for the previous two sections, ALT allows us to refine the analysis on these logics. Both MLH and MSL are interpreted on finite Kripke functions. A *finite Kripke function* is a Kripke structure (*, ,* ) (see Section 4.2 for its definition) where is infinite and , instead of being left-total, is finite and weakly functional, i.e. || ∈ ℕ and for every *,* ′ *,* ′′ ∈ , if (*,* ′ ) ∈ and (*,* ′′) ∈ then ′ = ′′. As and are both countably infinite sets, without loss of generality we assume = . Two Kripke structures <sup>1</sup> = (*,* 1*,* ) and <sup>2</sup> = (*,* 2*,* ) are disjoint if <sup>1</sup> <sup>∩</sup> <sup>2</sup> = ∅. When this holds, <sup>1</sup> <sup>+</sup> <sup>2</sup> denotes the model (*,* <sup>1</sup> <sup>∪</sup> 2*,* ). To shorten the presentation, in the following diagram we introduce a language having the operators from MSL and MLH, and summarise known and new results on these logics (where ∈ ):

 ∶= ∣ ⟨≠⟩ ∣ *⊤* ∣ ∧ ∣ ¬ ∣ ◊ ∣ ∗ ∣ ⟨U⟩ ∣ ◊−1 MSL: TOWER-complete from [18]. MLH: TOWER-complete from [17].

As defined below, ◊ is the standard alethic modality from modal logic, ◊−1 is its converse modality, and ⟨≠⟩ is the *elsewhere* modality that generalises the somewhere modality ⟨U⟩ as ⟨U⟩ = ∨ ⟨≠⟩ . For a *pointed model* (*,* ), where = (*, ,* ) is a finite Kripke function and ∈ , the satisfaction relation *⊧* is defined as follows:


By looking at the diagram above, compared to the work in [18], ALT allows us to show that propositional symbols and the elsewhere modality can be removed from MSL without changing the complexity status of its satisfiability problem. Similarly, ALT allows us to refine the analysis on the complexity of SAT(MLH) by showing that the ◊−1 modality is not needed in order to achieve non-elementary complexities.

Let (*, ,* ) be a pointed forest and let (*,* ) be a pointed model where = (*, ,* ). For the reduction, we use to encode the current node . Encoding is not so immediate, as MLH does not have propositional symbols. A possible solution is to encode it as a self-loop, so that the formula is translated to a query stating that reaches the self-loop. As done in Section 4.1 we define the formula =1 = ⟨U⟩ ◊*⊤* ∧ ¬(⟨U⟩ ◊*⊤* ∗ ⟨U⟩ ◊*⊤*), that is satisfied whenever ||=1. We also define the modalities <sup>⧫</sup> and <sup>⧫</sup>∗in MLH: <sup>⧫</sup>ML <sup>=</sup> (=1) ∗ and <sup>⧫</sup>∗ ML =*⊤* ∗ . Lastly, we introduce the formula <sup>=</sup> <sup>⧫</sup>∗ ML(◊◊*<sup>⊤</sup>* ∧ ¬⧫ML⧫ML*⊤*) that is satisfied by (*,* ) if (*,* ) ∈ . Suppose for a moment that we are able to use this formula to characterise the class of of every finite Kripke function (*, ,* ) where there is exactly one cycle, and this cycle is a self-loop on a world . Then, we use to encode the target node of a finite forest (*, ,* ) while being careful that the <sup>⧫</sup> and <sup>⧫</sup>∗ operators of ALT are translated in such a way that the self-loop on is preserved. Because of the specific treatment of , it is convenient to assume that the current evaluation node is encoded by a world different from , which reflects on the translation of ⟨U⟩. The admissibility of this assumption follows by Lemma 4.

We encode pointed forests as finite Kripke functions. Let (*, ,* ) be a pointed forest s.t. ∉ dom() and ≠ . A finite Kripke function ((*, ,* )*,* ) (recall, = ) is an *encoding* of (*, ,* ) iff for every ′ *,* ′′ ∈ we have (′ *,* ′′) ∈ ⇔ ((′ ) = ′′ or ′ = ′′ = ). Notice how is essentially defined from by adding the self-loop (*,* ). The translation () in MLH of a formula in ALT is homomorphic for *⊤* and Boolean connectives (as is the case for in Section 4.1), and otherwise it is defined as

$$\begin{array}{lll}\tau(\mathsf{T}) & \stackrel{\scriptstyle\mathsf{id}^{\mathsf{e}}}{\stackrel{\scriptstyle\mathsf{e}}{\longrightarrow}}\mathsf{\mathsf{Q}}\_{\mathsf{m}}^{\mathsf{e}}(\bigotimes\mathsf{T}\wedge[\mathsf{U}](\bigotimes\mathsf{T}\Rightarrow\mathsf{Q}\bigotimes\mathsf{T})). & \tau(\mathsf{\mathsf{A}}\,\varphi)\stackrel{\scriptstyle\mathsf{d}\mathsf{e}}{\stackrel{\scriptstyle\mathsf{e}}{\longrightarrow}}\mathsf{\mathsf{Q}}\_{\mathsf{m}}(\tau(\mathsf{q})\wedge\langle\mathsf{U}\rangle\,\mathsf{self}\,\mathsf{I}\mathsf{Loop}).\\\tau(\mathsf{G}) & \stackrel{\scriptstyle\mathsf{d}\mathsf{e}}{\stackrel{\scriptstyle\mathsf{e}}{\longrightarrow}}\mathsf{\mathsf{Q}}\top\wedge\neg\tau(\mathsf{T}). & \tau(\mathsf{\mathsf{A}}^{\mathsf{e}}\,\varphi)\stackrel{\scriptstyle\mathsf{d}\mathsf{e}}{\stackrel{\scriptstyle\mathsf{e}}{\longrightarrow}}\mathsf{\mathsf{Q}}\_{\mathsf{m}}^{\mathsf{e}}(\tau(\mathsf{q})\wedge\langle\mathsf{U}\rangle\,\mathsf{self}\,\mathsf{I}\mathsf{Loop}).\\\tau(\mathsf{(U}\,\bigotimes)\stackrel{\scriptstyle\mathsf{d}\mathsf{e}}{\stackrel{\scriptstyle\mathsf{e}}{\longrightarrow}}\langle\mathsf{U}\rangle(\bigcirc\mathsf{s}\,\texttt{e}\,\mathsf{L}\mathsf{f}\,\mathsf{L}\mathrm{cop}\wedge\tau(\mathsf{q})). \end{array}$$

We highlight two points of this translation. First, () essentially asks to find a submodel where every path reaches the self-loop and the current evaluation node is in one of these paths. Second, notice how the translation of ⧫ and ⧫∗checks that the model is updated so that the self-loop is not lost, as required by our encoding. It should be noted that this requirement cannot be met if we were translating the definition of ALT from [31], featuring the ∗ operator. Indeed, by partitioning the model into two pieces, this operator removes the self-loop from one of the two parts, breaking our encoding. The following lemma (proved by structural induction on ) shows the correctness of our translation.

**Lemma 11.** *Let* (*, ,* ) *be a pointed model s.t.* ≠ *and* ∉ dom()*. Let* (*,* ) *be an encoding of* (*, ,* )*. Given a formula in* ALT*,* (*, ,* ) *⊧ iff* (*,* ) *⊧* ()*.*

To conclude the reduction we show that we can characterise the class of models encoding pointed forests, i.e. the finite Kripke functions with exactly one cycle, which is a self-loop. We first define the formula <sup>=</sup> <sup>⧫</sup>∗ ML ( ⟨U⟩ ◊*⊤* ∧ [U](◊*⊤* ⇒ ◊◊*⊤*) ) that checks if a finite Kripke function has at least one cycle. Then, the desired property can be simply defined by stating that there is a self-loop which, whenever removed, leads to an acyclic submodel: = ⟨U⟩ ( ∧ ¬ ⧫ML(□⟂ ∧ ) ) .

**Lemma 12.** *Every formula in* ALT *is equisatisfiable with* () ∧ *.*

For the proof of Lemma 12, both Lemma 4(1) and (2) are used in order to restrict ourselves to pointed forest (*, ,* ) s.t. ≠ and ∉ dom(). Then, we apply Lemma 11.

**Theorem 3.** *The fragment of* MLH *and* MSL *with Boolean operators,* ◊ *and* ⟨U⟩ *modalities, and* <sup>∗</sup> *(alternatively,* <sup>⧫</sup>ML *and* <sup>⧫</sup><sup>∗</sup> ML *) has a* TOWER*-complete satisfiability problem.*

## **5 Conclusions**

We studied an *Auxiliary Logic on Trees* (ALT), a quite simple formalism that admits a TOWER-complete satisfiability problem. ALT is shown to be easily captured by various non-elementary logics: first-order separation logic, quantified CTL, modal logic of heaps and modal separation logic. Through ALT, we were not only able to connect these logics, but also to refine their analysis and find strict fragments that are still TOWER-hard. Most importantly, with ALT we hope to have shown a set of simple and concrete properties, centred around reachability and submodel reasoning, that when put together lead to logics having a non-elementary satisfiability problem.

This work leaves a few questions open. First, the fragments of ALT where ⧫ or ⧫∗are removed from the logic have not being studied yet. The logic without ⧫∗is of particular interests, as it is connected with the sabotage logics from [4]. Second, the analysis done on first-order separation logic and on modal logic of heaps (Sections 4.1 and 4.3) reveals that the complexity of these logics does not change when the ∗ operator and the predicate are replaced with the less general operators ⧫ and ⧫∗. We find this point interesting, as from an overview of the literature, it seems that this result also holds for the separation logics considered in [9,17,19,30,31]. Moreover, for the logics whose expressiveness is known, i.e. the ones in [19,30], it seems that also the expressive power remains unchanged. However, we struggle to see how to uniformly express the operator ∗ with ⧫ and ⧫∗, as the resulting logics reason on the model in a different way (as as shown in Section 2). Lastly, this work illustrates the potential of ALT as a tool for proving the TOWER-hardness of logics interpreted on tree-like structures. As the operators of our logic are simple, we hope ALT to be useful to study logics with unknown complexities.

**Acknowlegements.** I would like to thank S. Demri and E. Lozes for their feedback.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **The Inconsistent Labelling Problem of Stutter-Preserving Partial-Order Reduction**

Thomas Neele1() , Antti Valmari2, and Tim A.C. Willemse<sup>1</sup>

<sup>1</sup> Eindhoven University of Technology, Eindhoven, The Netherlands {t.s.neele, t.a.c.willemse}@tue.nl <sup>2</sup> University of Jyv¨askyl¨a, Jyv¨askyl¨a, Finland

antti.valmari@jyu.fi

**Abstract.** In model checking, partial-order reduction (POR) is an effective technique to reduce the size of the state space. Stubborn sets are an established variant of POR and have seen many applications over the past 31 years. One of the early works on stubborn sets shows that a combination of several conditions on the reduction is sufficient to preserve stutter-trace equivalence, making stubborn sets suitable for model checking of linear-time properties. In this paper, we identify a flaw in the reasoning and show with a counter-example that stutter-trace equivalence is not necessarily preserved. We propose a solution together with an updated correctness proof. Furthermore, we analyse in which formalisms this problem may occur. The impact on practical implementations is limited, since they all compute a correct approximation of the theory.

## **1 Introduction**

In formal methods, model checking is a technique to automatically decide the correctness of a system's design. The many interleavings of concurrent processes can cause the state space to grow exponentially with the number of components, known as the *state-space explosion* problem. *Partial-order reduction* (POR) is one technique that can alleviate this problem. Several variants of POR exist, such as *ample sets* [11], *persistent set* [7] and *stubborn sets* [16,21]. For each of those variants, sufficient conditions for preservation of stutter-trace equivalence have been identified. Since LTL without the next operator (LTL−<sup>X</sup>) is invariant under finite stuttering, this allows one to check most LTL properties under POR.

However, the correctness proofs for these methods are intricate and not reproduced often. For stubborn sets, LTL−<sup>X</sup>-preserving conditions and an accompanying correctness result were first presented in [15], and discussed in more detail in [17]. While trying to reproduce the proof for [17, Theorem 2] (see also Theorem 1 in the current work), we ran into an issue while trying to prove a certain property of the construction used in the original proof [17, Construction 1]. This led us to discover that stutter-trace equivalence is not necessarily preserved. We will refer to this as the *inconsistent labelling problem*. The essence of the problem is that POR in general, and the proofs in [17] in particular, reason mostly about actions, which label the transitions. The only relevance of the state labelling is that it determines which actions are *visible*. On the other hand, stutter-trace equivalence and the LTL semantics are purely based on state labels. The correctness proof in [17] does not deal properly with this disparity. Further investigation shows that the same problem also occurs in two works of Beneˇs *et al.* [2,3], who apply ample sets to state/event LTL model checking.

Consequently, any application of stubborn sets in LTL−<sup>X</sup> model checking is possibly unsound, both for safety and liveness properties. In literature, the correctness of several theories [9,10,18] relies on the incorrect theorem.

Our contributions are as follows:


Our investigation shows that probably all practical implementations of stubborn sets compute an approximation which resolves the inconsistent labelling problem. Furthermore, POR methods based on the standard independence relation, such as ample sets and persistent sets, are not affected.

The rest of the paper is structured as follows. In Section 2, we introduce the basic concepts of stubborn sets and stutter-trace equivalence, which is not preserved in the counter-example of Section 3. A solution to the inconsistent labelling problem is discussed in Section 4, together with an updated correctness proof. Sections 5 and 6 discuss several settings in which correctness is not affected. Finally, Section 7 presents related work and Section 8 presents a conclusion.

## **2 Preliminaries**

Since LTL relies on state labels and POR relies on edge labels, we assume the existence of some fixed set of atomic propositions *AP* to label the states and a fixed set of edge labels *Act*, which we will call *actions*. Actions are typically denoted with the letter a.

**Definition 1.** *A* labelled state transition system*, short* LSTS*, is a directed graph TS* = (S, →, s, L ˆ )*, where:*


We write s <sup>a</sup> −→ <sup>t</sup> whenever (s, a, t) ∈ →. A *path* is a (finite or infinite) alternating sequence of states and actions: <sup>s</sup><sup>0</sup> <sup>a</sup><sup>1</sup> −→ <sup>s</sup><sup>1</sup> <sup>a</sup><sup>2</sup> −→ <sup>s</sup><sup>2</sup> ... . We sometimes omit the intermediate and/or final states if they are clear from the context or not relevant, and write <sup>s</sup> <sup>a</sup>1...a<sup>n</sup> −−−−→ <sup>t</sup> or <sup>s</sup> <sup>a</sup>1...a<sup>n</sup> −−−−→ for finite paths and <sup>s</sup> <sup>a</sup>1a2... −−−−→ for infinite paths. Paths that start in the initial state ˆs are called *initial paths*. Given a path <sup>π</sup> <sup>=</sup> <sup>s</sup><sup>0</sup> <sup>a</sup><sup>1</sup> −→ <sup>s</sup><sup>1</sup> <sup>a</sup><sup>2</sup> −→ <sup>s</sup><sup>2</sup> ... , the *trace* of <sup>π</sup> is the sequence of state labels observed along π, *viz.* L(s0)L(s1)L(s2)... . An action a is *enabled* in a state s, notation s <sup>a</sup> −→, if and only if there is a transition <sup>s</sup> <sup>a</sup> −→ <sup>t</sup> for some <sup>t</sup>. In a given LSTS *TS*, *enabled* TS (s) is the set of all enabled actions in a state s. A set I of *invisible* actions is chosen such that if (but not necessarily only if) a ∈ I, then for all states s and t, s <sup>a</sup> −→ <sup>t</sup> implies <sup>L</sup>(s) = <sup>L</sup>(t). Note that this definition allows the set I to be under-approximated. An action that is not invisible is called *visible*. We say *TS* is *deterministic* if and only if s <sup>a</sup> −→ <sup>t</sup> and <sup>s</sup> <sup>a</sup> −→ <sup>t</sup> imply t = t - , for all states s, t and t and actions a. To indicate that *TS* is not necessarily deterministic, we say *TS* is *non-deterministic*.

## **2.1 Stubborn sets**

In POR, *reduction functions* play a central role. A reduction function r : S → 2Act indicates which transitions to explore in each state. When starting at the initial state ˆs, a reduction function induces a *reduced LSTS* as follows.

**Definition 2.** *Let TS* = (S, <sup>→</sup>, s, L <sup>ˆ</sup> ) *be an LSTS and* <sup>r</sup> : <sup>S</sup> <sup>→</sup> <sup>2</sup>Act *a reduction function. Then the* reduced LSTS *induced by* r *is defined as TS*<sup>r</sup> = (Sr, →<sup>r</sup> , s, L ˆ <sup>r</sup>)*, where* L<sup>r</sup> *is the restriction of* L *on* Sr*, and* S<sup>r</sup> *and* →<sup>r</sup> *are the smallest sets such that the following holds:*

**–** sˆ ∈ Sr*; and* **–** *If* <sup>s</sup> <sup>∈</sup> <sup>S</sup>r*,* <sup>s</sup> <sup>a</sup> −→ <sup>t</sup> *and* <sup>a</sup> <sup>∈</sup> <sup>r</sup>(s)*, then* <sup>t</sup> <sup>∈</sup> <sup>S</sup><sup>r</sup> *and* <sup>s</sup> <sup>a</sup> −→<sup>r</sup> <sup>t</sup>*.*

Note that we have →<sup>r</sup> ⊆ →. In the remainder of this paper, we will assume the reduced LSTS is finite. This is essential for the correctness of the approach detailed below. In general, a reduction function is not guaranteed to preserve almost any property of an LSTS. Below, we list a number of conditions that have been proposed in literature; they aim to preserve LTL−<sup>X</sup>. Here, we call an action <sup>a</sup> <sup>a</sup> *key action* in <sup>s</sup> iff for all paths <sup>s</sup> <sup>a</sup>1...a<sup>n</sup> −−−−→ <sup>s</sup> such that a<sup>1</sup> ∈/ r(s),...,a<sup>n</sup> ∈/ r(s), it holds that s a −→. We typically denote key actions by <sup>a</sup>key.


**Fig. 1:** Visual representation of condition **D1**.

These conditions are used to define *strong* and *weak* stubborn sets in the following way.

**Definition 3.** *A reduction function* <sup>r</sup> : <sup>S</sup> <sup>→</sup> <sup>2</sup>Act *is a* strong stubborn set *iff for all states* s ∈ S*, the conditions D0, D1, D2, V, I, L all hold.*

**Definition 4.** *A reduction function* <sup>r</sup> : <sup>S</sup> <sup>→</sup> <sup>2</sup>Act *is a* weak stubborn set *iff for all states* s ∈ S*, the conditions D1, D2w, V, I, L all hold.*

Below, we also use 'weak/strong stubborn set' to refer to the set of actions r(s) in some state s. First, note that key actions are always enabled, by setting n = 0. Furthermore, a stubborn set can never introduce new deadlocks, either by **D0** or **D2w**. Condition **D1** enforces that a key action akey ∈ r(s) does not disable other paths that are not selected for the stubborn set. A visual representation of condition **D1** can be found in Figure 1. When combined, **D1** and **D2w** are sufficient conditions for preservation of deadlocks. Condition **V** enforces that the paths <sup>s</sup> <sup>a</sup>1...an<sup>a</sup> −−−−−→ <sup>s</sup>- <sup>n</sup> and <sup>s</sup> aa1...a<sup>n</sup> −−−−−→ <sup>s</sup>- <sup>n</sup> in **D1** contain the same sequence of visible actions. The purpose of condition **I** is to preserve the possibility to perform an invisible action, if one is enabled. Finally, we have condition **L** to deal with the *action-ignoring problem*, which occurs when an action is never selected for the stubborn set and always ignored. Since we assume that the reduced LSTS is finite, it suffices to reason in **L** about every cycle instead of every infinite path. The combination of **I** and **L** helps to preserve divergences (infinite paths containing only invisible actions).

Conditions **D0** and **D2** together imply **D2w**, and thus every strong stubborn set is also a weak stubborn set. Since the reverse does not necessarily hold, weak stubborn sets might offer more reduction.

#### **2.2 Weak and Stutter Equivalence**

To reason about the similarity of an LSTS *TS* and its reduced LSTS *TS*r, we introduce the notions of *weak equivalence*, which operates on actions, and *stutter equivalence*, which operates on states. The definitions are generic, so that they can also be used in Section 6.

**Definition 5.** *Two paths* π *and* π *are weakly equivalent with respect to a set of actions* A*, notation* π ∼<sup>A</sup> π- *, if and only if they are both finite or both infinite and their respective projections on Act* \ A *are equal.*

**Definition 6.** *The* no-stutter trace *under labelling* <sup>L</sup> *of a path* <sup>s</sup><sup>0</sup> <sup>a</sup><sup>1</sup> −→ <sup>s</sup><sup>1</sup> <sup>a</sup><sup>2</sup> −→ ... *is the sequence of those* L(si) *such that* i = 0 *or* L(si) = L(si−1)*. Paths* π *and* π *are stutter equivalent under* L*, notation* π -<sup>L</sup> π- *, iff they are both finite or both infinite, and they yield the same no-stutter trace under* L*.*

We typically consider weak equivalence with respect to the set of invisible actions I. In that case, we write π ∼ π- . We also omit the subscript for stutter equivalence when reasoning about the standard labelling function and write π π- . Remark that stutter equivalence is invariant under finite repetitions of state labels, hence its name. We lift both equivalences to LSTSs, and say that *TS* and *TS* are *weak-trace equivalent* iff for every initial path π in *TS*, there is a weakly equivalent initial path π in *TS* and vice versa. Likewise, *TS* and *TS*- are *stutter-trace equivalent* iff for every initial path π in *TS*, there is a stutter equivalent initial path π in *TS*and vice versa.

In general, weak equivalence and stutter equivalence are incomparable, even for initial paths. However, for some LSTSs, these notions can be related in a certain way. We formalise this in the following definition.

**Definition 7.** *Let TS be an LSTS and* π *and* π *two paths in TS that both start in some state* s*. Then, TS is* labelled consistently *iff* π ∼ π *implies* π π- *.*

Note that if an LSTS is labelled consistently, then in particular all weakly equivalent initial paths are also stutter equivalent. Hence, if an LSTS *TS* is labelled consistently and weak-trace equivalent to a subgraph *TS*- , then *TS* and *TS*are also stutter-trace equivalent.

Stubborn sets as defined in the previous section aim to preserve stutter-trace equivalence between the original and the reduced LSTS. The motivation behind this is that two stutter-trace equivalent LSTSs satisfy exactly the same formulae [1] in LTL−<sup>X</sup>. The following theorem, which is frequently cited in literature [9,10,18], aims to show that stubborn sets indeed preserve stutter-trace equivalence. Its original formulation reasons about the validity of an arbitrary LTL−<sup>X</sup> formula. Here, we give the alternative formulation based on stutter-trace equivalence.

**Theorem 1.** *[17, Theorem 2] Given an LSTS TS and a weak/strong stubborn set* r*, then the reduced LSTS TS*<sup>r</sup> *is stutter-trace equivalent to TS .*

The original proof correctly concludes that the stubborn set method preserves the order of visible actions in the reduced LSTS, *i.e.*, *TS* ∼ *TS*r. However, this only implies preservation of stutter-trace equivalence (*TS* - *TS*r) if the full LSTS is labelled consistently, so Theorem 1 is invalid in the general case. In the next section, we will see a counter-example which exploits this fact.

## **3 Counter-Example**

Consider the LSTS in Figure 2, which we will refer to as *TS* <sup>C</sup> . There is only one atomic proposition q, which holds in the grey states and is false in the other states. The initial state ˆs is marked with an incoming arrow. First, note that this LSTS is deterministic. The actions a1, a<sup>2</sup> and a<sup>3</sup> are visible and a and akey are invisible. By setting r(ˆs) = {a, akey}, which is a weak stubborn set, we obtain a reduced LSTS *TS* <sup>C</sup> <sup>r</sup> that does not contain the dashed states and transitions. The original LSTS contains the trace ∅{q}∅∅{q}ω, obtained by following the path with actions a1a2aa<sup>ω</sup> <sup>3</sup> . However, the reduced LSTS does not contain a stutter equivalent trace. This is also witnessed by the LTL−<sup>X</sup> formula (<sup>q</sup> <sup>⇒</sup> (<sup>q</sup> <sup>∨</sup> ¬q)), which holds for *TS* <sup>C</sup> <sup>r</sup> , but not for *TS* <sup>C</sup> .

**Fig. 2:** Counter-example showing that stubborn sets do not preserve stuttertrace equivalence. Grey states are labelled with {q}. The dashed transitions and states are not present in the reduced LSTS.

A very similar example can be used to show that strong stubborn sets suffer from the same problem. Consider again the LSTS in Figure 2, but assume that a = akey, making the LSTS non-deterministic. Now, r(ˆs) = {a} is a strong stubborn set and again the trace ∅{q}∅∅{q}<sup>ω</sup> is not preserved in the reduced LSTS. In Section 4.3, we will see why the inconsistent labelling problem does not occur for deterministic systems under strong stubborn sets.

The core of the problem lies in the fact that condition **D1**, even when combined with **V**, does not enforce that the two paths it considers are stutter equivalent. Consider the paths s <sup>a</sup> −→ and <sup>s</sup> <sup>a</sup>1a2<sup>a</sup> −−−−→ and assume that <sup>a</sup> <sup>∈</sup> <sup>r</sup>(s) and a<sup>1</sup> ∈/ r(s), a<sup>2</sup> ∈/ r(s). Condition **V** ensures that at least one of the following two holds: (i) a is invisible, or (ii) a<sup>1</sup> and a<sup>2</sup> are invisible. Half of the possible scenarios are depicted in Figure 3; the other half are symmetric. Again, the grey states (and only those states) are labelled with {q}.

The two cases delimited with a solid line are problematic. In both LSTSs, the paths <sup>s</sup> <sup>a</sup>1a2<sup>a</sup> −−−−→ <sup>s</sup> and <sup>s</sup> aa1a<sup>2</sup> −−−−→ <sup>s</sup> are weakly equivalent, since a is invisible. However, they are not stutter equivalent, and therefore these LSTSs are not labelled consistently. The topmost of these two LSTSs forms the core of the counter-example *TS* <sup>C</sup> , with the rest of *TS* <sup>C</sup> serving to satisfy condition **D2**/**D2w**.

**Fig. 3:** Nine possible scenarios when a ∈ r(s) and a<sup>1</sup> ∈/ r(s), a<sup>2</sup> ∈/ r(s), according to conditions **D1** and **V**. The dotted and dashed lines indicate when a or a1, a<sup>2</sup> are invisible, respectively.

## **4 Strengthening Condition D1**

To fix the issue with inconsistent labelling, we propose to strengthen condition **D1** as follows.

**D1'** For all <sup>a</sup> <sup>∈</sup> <sup>r</sup>(s) and <sup>a</sup><sup>1</sup> <sup>∈</sup>/ <sup>r</sup>(s),...,a<sup>n</sup> <sup>∈</sup>/ <sup>r</sup>(s), if <sup>s</sup> <sup>a</sup><sup>1</sup> −→ <sup>s</sup><sup>1</sup> <sup>a</sup><sup>2</sup> −→ ··· <sup>a</sup><sup>n</sup> −−→ <sup>s</sup><sup>n</sup> <sup>a</sup> −→ s- <sup>n</sup>, then there are states s- , s- 1,...,s- <sup>n</sup>−<sup>1</sup> such that <sup>s</sup> <sup>a</sup> −→ <sup>s</sup> <sup>a</sup><sup>1</sup> −→ <sup>s</sup>- 1 <sup>a</sup><sup>2</sup> −→ ··· <sup>a</sup><sup>n</sup> −−→ s- <sup>n</sup>. Furthermore, if a is invisible, then s<sup>i</sup> <sup>a</sup> −→ <sup>s</sup>- <sup>i</sup> for every 1 ≤ i<n.

This new condition **D1'** provides a form of *local* consistent labelling when one of a1,...,a<sup>n</sup> is visible. In this case, **V** implies that a is invisible and, consequently, the presence of transitions s<sup>i</sup> <sup>a</sup> −→ <sup>s</sup>- <sup>i</sup> implies L(si) = L(s- <sup>i</sup>). Hence, the problematic cases of Figure 3 are resolved; a correctness proof is given below.

Condition **D1'** is very similar to condition **C1** [5], which is common in the context of ample sets. However, **C1** requires that action a is *globally* independent of each of the actions a1,...,an, while **D1'** merely requires a kind of *local* independence. Persistent sets [7] also rely on a condition similar to **D1'**, and require local independence.

#### **4.1 Implementation**

In practice, most, if not all, implementations of stubborn sets approximate **D1** based on a binary relation <sup>s</sup> on actions. This relation may (partly) depend on the current state s and it is defined such that **D1** can be satisfied by ensuring that if <sup>a</sup> <sup>∈</sup> <sup>r</sup>(s) and <sup>a</sup> <sup>s</sup> a- , then also a- ∈ r(s). A set satisfying **D0**, **D1**, **D2**, **D2w**, **V** and/or **I** can be found by searching for a suitable *strongly connected component* in the graph (*Act*, <sup>s</sup>). Condition **L** is dealt with by other techniques.

Practical implementations construct <sup>s</sup> by analysing how any two actions a and a interact. If a is enabled, the simplest (but not necessarily the best possible) strategy is to make <sup>a</sup> <sup>s</sup> a if and only if a and a access at least one variable in common. This can be relaxed, for instance, by not considering commutative accesses, such as writing to and reading from a FIFO buffer. As a result, <sup>s</sup> can only detect reduction opportunities in (sub)graphs of the shape

$$\begin{array}{c} s \xrightarrow{a\_1} s\_1 \xrightarrow{} \cdots \xrightarrow{} \star s\_{n-1} \xrightarrow{} \xrightarrow{a\_n} s\_n\\ a \quad \bigsqcup a \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad a\\ s' \xrightarrow{} \xrightarrow{a\_1} s'\_1 \xrightarrow{} \cdots \xrightarrow{} \star s'\_{n-1} \xrightarrow{} \xrightarrow{a\_n} s'\_n \end{array}$$

where a ∈ r(s) and a<sup>1</sup> ∈/ r(s),...,a<sup>n</sup> ∈/ r(s). The presence of the vertical a transitions in s1,...,s<sup>n</sup>−<sup>1</sup> implies that **D1'** is also satisfied by such implementations.

#### **4.2 Correctness**

To show that **D1'** indeed resolves the inconsistent labelling problem, we reproduce the construction in the original proof [17, Construction 1] in two lemmata and show that it preserves stutter equivalence. Below, recall that →<sup>r</sup> indicates which transitions occur in the reduced state space.

**Lemma 1.** *Let* r *be a weak stubborn set, where condition D1 is replaced by D1', and* <sup>π</sup> <sup>=</sup> <sup>s</sup><sup>0</sup> <sup>a</sup><sup>1</sup> −→ ··· <sup>a</sup><sup>n</sup> −−→ <sup>s</sup><sup>n</sup> <sup>a</sup> −→ <sup>s</sup>- <sup>n</sup> *a path such that* a<sup>1</sup> ∈/ r(s0),...,a<sup>n</sup> ∈/ r(s0) *and* a ∈ r(s0)*. Then, there is a path* π- = s<sup>0</sup> <sup>a</sup> −→<sup>r</sup> <sup>s</sup>- 0 <sup>a</sup><sup>1</sup> −→ ··· <sup>a</sup><sup>n</sup> −−→ <sup>s</sup>- <sup>n</sup> *such that* π π- *.*

*Proof.* The existence of π follows directly from condition **D1'**. Due to condition **V** and our assumption that a<sup>1</sup> ∈/ r(s0),...,a<sup>n</sup> ∈/ r(s0), it cannot be the case that a is visible and at least one of a1,...,a<sup>n</sup> is visible. If a is invisible, then the traces of <sup>s</sup><sup>0</sup> <sup>a</sup><sup>1</sup> −→ ··· <sup>a</sup><sup>n</sup> −−→ <sup>s</sup><sup>n</sup> and <sup>s</sup>- 0 <sup>a</sup><sup>1</sup> −→ ··· <sup>a</sup><sup>n</sup> −−→ <sup>s</sup>- <sup>n</sup> are equivalent, since **D1'** implies that s<sup>i</sup> <sup>a</sup> −→ <sup>s</sup>- <sup>i</sup> for every 0 ≤ i ≤ n, so L(s- <sup>i</sup>) = L(si). Otherwise, if all of a1,...,a<sup>n</sup> are invisible, then the sequences of labels observed along π and π have the shape L(s0)<sup>n</sup>+1L(s- <sup>0</sup>) and L(s0)L(s- <sup>0</sup>)<sup>n</sup>+1, respectively. We conclude that π π- . 

**Lemma 2.** *Let* r *be a weak stubborn set, where condition D1 is replaced by D1', and* <sup>π</sup> <sup>=</sup> <sup>s</sup><sup>0</sup> <sup>a</sup><sup>1</sup> −→ <sup>s</sup><sup>1</sup> <sup>a</sup><sup>2</sup> −→ ... *a path such that* <sup>a</sup><sup>i</sup> <sup>∈</sup>/ <sup>r</sup>(s0) *for any* <sup>a</sup><sup>i</sup> *that occurs in* π*. Then, the following holds:*


*In either case,* π π- *.* *Proof.* Let K be the set of key actions in s. If a<sup>1</sup> is invisible, K contains at least one invisible action, due to **I**. Otherwise, if a<sup>1</sup> is visible, we reason that K is not empty (condition **D2w**) and all actions in r(s0), and thus also all actions in K, are invisible, due to **V**. In the remainder, let akey be an invisible key action.

In case π has finite length n, the existence of s<sup>n</sup> akey −−→ <sup>s</sup>- <sup>n</sup> and s<sup>0</sup> akey −−→<sup>r</sup> <sup>s</sup>- 0 a1 −→ ··· <sup>a</sup><sup>n</sup> −−→ <sup>s</sup>- <sup>n</sup> follows from the definition of key actions and **D1'**, respectively.

If π is infinite, we can apply the definition of key actions and **D1'** successively to obtain a path π<sup>i</sup> = s<sup>0</sup> akey −−→ <sup>s</sup>- 0 <sup>a</sup><sup>1</sup> −→ ··· <sup>a</sup><sup>i</sup> −→ <sup>s</sup>- <sup>i</sup> for every i ≥ 0, with s<sup>j</sup> akey −−→ <sup>s</sup>- <sup>j</sup> for every 1 ≤ j<i. Since the reduced state space is finite, infinitely many of these paths must use the same state as s- <sup>0</sup>. At most one of them ends at s- <sup>0</sup> (the one with i = 0), so infinitely many continue from s- <sup>0</sup>. Of them, infinitely many must use the same s- <sup>1</sup>, again because the reduced state space is finite. Again, at most one of them is lost because of ending at s- <sup>1</sup>. This reasoning can continue without limit, proving the existence of π- = s<sup>0</sup> akey −−→<sup>r</sup> <sup>s</sup>- 0 <sup>a</sup><sup>1</sup> −→ <sup>s</sup>- 1 <sup>a</sup><sup>2</sup> −→ ... , with <sup>s</sup><sup>j</sup> akey −−→ <sup>s</sup>- j for every j ≥ 0.

Since akey is invisible, we have L(s<sup>j</sup> ) = L(s- <sup>j</sup> ) for every j ≥ 0. This implies π π- . 

Lemmata 1 and 2 coincide with branches 1 and 2 of [17, Construction 1], respectively, but contain the stronger result that π π- . Thus, when applied in the proof of [17, Theorem 2] (see also Theorem 1), this yields the result that stubborn sets with condition **D1'** preserve stutter-trace equivalence.

**Theorem 2.** *Given an LSTS TS and weak/strong stubborn set* r*, where condition D1 is replaced by D1', then the reduced LSTS TS*<sup>r</sup> *is stutter-trace equivalent to TS .*

We do not reproduce the complete proof, but provide insight into the application of the lemmata with the following example.

*Example 1.* Consider the path obtained by following a1a2a<sup>3</sup> in Figure 4. Lemmata 1 and 2 show that a1a2a<sup>3</sup> can always be mimicked in the reduced LSTS, while preserving stutter equivalence. In this case, the path is mimicked by the path corresponding to akeya2a1a- keya3, drawn with dashes. The new path reorders the actions a1, a<sup>2</sup> and a<sup>3</sup> according to the construction of Lemma 1 and introduces the key actions akey and a- key according to Lemma 2. 

We remark that Lemma 2 also holds if the reduced LSTS is infinite, but finitely branching.

### **4.3 Deterministic LSTSs**

As already noted in Section 3, strong stubborn sets for deterministic systems do not suffer from the inconsistent labelling problem. The following lemma, which also appeared as [20, Lemma 4.2], shows why.

**Lemma 3.** *For deterministic LSTSs, conditions D1 and D2 together imply D1'.*

**Fig. 4:** Example of how the trace a1, a2, a<sup>3</sup> can be mimicked by introducing additional actions and moving a<sup>2</sup> to the front (dashed trace). Transitions that are drawn in parallel have the same label.

## **5 Safe Logics**

In this section, we will identify two logics, *viz.* reachability and CTL−<sup>X</sup>, which are not affected by the inconsistent labelling problem. This is either due to their limited expressivity or the extra POR conditions that are required.

### **5.1 Reachability properties**

Although the counter-example of Section 3 shows that stutter-trace equivalence is in general not preserved by stubborn sets, some fragments of LTL−<sup>X</sup> are preserved. One such class of properties is reachability properties, which are of the shape <sup>f</sup> or f, where <sup>f</sup> is a formula not containing temporal operators.

**Theorem 3.** *Let TS be an LSTS,* r *a reduction function that satisfies either D0, D1, D2, V and L or D1, D2w, V and L and TS*<sup>r</sup> *the reduced LSTS. For all possible labellings* l ⊆ *AP , TS contains a path to a state* s *such that* L(s) = l *iff TS*<sup>r</sup> *contains a path to a state* s *such that* L(s- ) = l*.*

*Proof.* The 'if' case is trivial, since *TS*<sup>r</sup> is a subgraph of *TS*. For the 'only if' case, we reason as follows. Let *TS* = (S, <sup>→</sup>, s, L <sup>ˆ</sup> ) be an LSTS and <sup>π</sup> <sup>=</sup> <sup>s</sup><sup>0</sup> <sup>a</sup><sup>1</sup> −→ ··· <sup>a</sup><sup>n</sup> −−→ <sup>s</sup><sup>n</sup> a path such that s<sup>0</sup> = ˆs. We mimic this path by repeatedly taking some enabled action a that is in the stubborn set, according to the following schema. Below, we assume the path to be mimicked contains at least one visible action. Otherwise, its first state would have the same labelling as sn.


The second case cannot be repeated infinitely often, due to condition **L**. Hence, after a finite number of steps, we reach a state s- <sup>n</sup> with L(s- <sup>n</sup>) = L(sn). 

We remark that more efficient mechanisms for reachability checking under POR have been proposed, such as condition **S** [21], which can replace **L**, or conditions based on *up-sets* [13]. Another observation is that model checking of LTL−<sup>X</sup> properties can be reduced to reachability checking by computing the cross-product of a B¨uchi automaton and an LSTS [1], in the process resolving the inconsistent labelling problem. Peled [12] shows how this approach can be combined with POR, but please see [14].

## **5.2 Deterministic LSTSs and CTL***−<sup>X</sup>* **Model Checking**

In this section, we will consider the inconsistent labelling problem in the setting of CTL−<sup>X</sup> model checking. When applying stubborn sets in that context, stronger conditions are required to preserve the branching structure that CTL−<sup>X</sup> reasons about. Namely, the original LSTS must be deterministic and one more condition needs to be added [5]:

**C4** Either r(s) = *Act* or r(s) ∩ *enabled*(s) = {a} for some a ∈ *Act*.

We slightly changed its original formulation to match the setting of stubborn sets. A weaker condition, called **A8 ¨** , which does not require determinism of the whole LSTS is proposed in [19]. With **C4**, strong and weak stubborn sets collapse, as shown by the following lemma.

## **Lemma 4.** *Conditions D2w and C4 together imply D0 and D2.*

*Proof.* Let *TS* be an LSTS, s a state and r a reduction function that satisfies **D2w** and **C4**. Condition **D0** is trivially implied by **C4**. Using **C4**, we distinguish two cases: either r(s) contains precisely one enabled action a, or r(s) = *Act*. In the former case, this single action a must be a key action, according to **D2w**. Hence, **D2**, which requires that all enabled actions in r(s) are key actions, is satisfied. Otherwise, if r(s) = *Act*, we consider an arbitrary action a that satisfies **D2**'s precondition that s <sup>a</sup> −→. Given a path <sup>s</sup> <sup>a</sup>1...a<sup>n</sup> −−−−→, the condition that a<sup>1</sup> ∈/ r(s),...,a<sup>n</sup> ∈/ r(s) only holds if n = 0. We conclude that **D2**'s condition <sup>s</sup> <sup>a</sup>1...an<sup>a</sup> −−−−−→ is satisfied by the assumption <sup>s</sup> <sup>a</sup> −→. 

It follows from Lemmata 3 and 4 and Theorem 2 that CTL−<sup>X</sup> model checking of deterministic systems with stubborn sets does not suffer from the inconsistent labelling problem. The same holds for condition **A8 ¨** , as already shown in [19].

## **6 Petri Nets**

Petri nets are a widely-known formalism for modelling concurrent processes and have seen frequent use in the application of stubborn-set theory [4,10,21,22]. A Petri net contains a set of *places* P and a set of *structural transitions* T. *Arcs* between places and structural transitions are weighted according to a total function <sup>W</sup> : (P×T)∪(T×P) <sup>→</sup> <sup>N</sup>. The state space of the underlying LSTS is the set <sup>M</sup> of all *markings*; a marking <sup>m</sup> is a function <sup>P</sup> <sup>→</sup> <sup>N</sup>, which assigns a number of *tokens* to each place. The LSTS contains a transition m <sup>t</sup> −→ <sup>m</sup> iff m(p) ≥ W(p, t) and m- (p) = m(p)−W(p, t) +W(t, p) for all places p ∈ P. As before, we assume the LSTS contains some labelling function <sup>L</sup> : M → <sup>2</sup>AP . More details on the labels are given below. Note that markings and structural transitions take over the role of states and actions respectively. The set of markings reachable under → from some *initial marking* mˆ is denoted Mreach .

*Example 2.* Consider the Petri net with initial marking ˆm below on the left. Here, all arcs are weighted 1, except for the arc from p<sup>5</sup> to t2, which is weighted 2. Its LSTS is infinite, but the reachable substructure is depicted on the right. The number of tokens in each of the places p1,...,p<sup>6</sup> is inscribed in the nodes, the state labels (if any) are written beside the nodes.

The LSTS practically coincides with the counter-example of Section 3. Only the self-loops are missing and the state labelling, with atomic propositions q, q<sup>p</sup> and ql, differs slightly; the latter will be explained later. For now, note that t and tkey are invisible and that the trace ∅{qp}∅∅{q}, which occurs when firing transitions t1t2tt<sup>3</sup> from ˆm, can be lost when reducing with weak stubborn sets. 

In the remainder of this section, we fix a Petri net (P, T, W, mˆ ) and its LSTS (M, →, m, L ˆ ). Below, we consider three different types of atomic propositions. Firstly, polynomial propositions [4] are of the shape f(p1,...,pn) k where f is a polynomial over <sup>p</sup>1,...,pn, ∈ {<, <sup>≤</sup>, >, <sup>≥</sup>, <sup>=</sup>, =} and <sup>k</sup> <sup>∈</sup> <sup>Z</sup>. Such a proposition holds in a marking m iff f(m(p1),...,m(pn)) k. A linear proposition [10] is similar, but the function f over places must be linear and f(0,..., 0) = 0, *i.e.*, linear propositions are of the shape <sup>k</sup>1p1+···+knp<sup>n</sup> k, where <sup>k</sup>1,...,kn, k <sup>∈</sup> <sup>Z</sup>. Finally, we have arbitrary propositions [22], whose shape is not restricted and which can hold in any given set of markings.

Several other types of atomic propositions can be encoded as polynomial propositions. For example, *fireable*(t) [4,10], which holds in a marking m iff t is enabled in m, can be encoded as - p∈P -W(p,t)−1 <sup>i</sup>=0 (p − i) ≥ 1. The proposition *deadlock*, which holds in markings where no structural transition is enabled, does not require special treatment in the context of POR, since it is already preserved by **D1** and **D2w**. The sets containing all linear and polynomial propositions are henceforward called *AP*<sup>l</sup> and *AP*p, respectively. The corresponding labelling functions are defined as Ll(m) = L(m) ∩ *AP*<sup>l</sup> and Lp(m) = L(m) ∩ *AP*<sup>p</sup> for all markings m. Below, the two stutter equivalences -<sup>L</sup><sup>l</sup> and -<sup>L</sup><sup>p</sup> that follow from the new labelling functions are abbreviated <sup>l</sup> and <sup>p</sup>, respectively. Note that *AP* <sup>⊇</sup> *AP*<sup>p</sup> <sup>⊇</sup> *AP*<sup>l</sup> and - <sup>⊆</sup> <sup>p</sup> <sup>⊆</sup> l.

For the purpose of introducing several variants of invisibility, we reformulate and generalise the definition of invisibility from Section 2. Given an atomic proposition q ∈ *AP*, a relation R ⊆ M×M is q*-invisible* if and only if (m, m- ) ∈ R implies q ∈ L(m) ⇔ q ∈ L(m- ). We consider a structural transition t q-invisible iff its corresponding relation {(m, m- ) <sup>|</sup> <sup>m</sup> <sup>t</sup> −→ <sup>m</sup>- } is q-invisible. Invisibility is also lifted to sets of atomic propositions: given a set *AP*- ⊆ *AP*, relation R is *AP*- *-invisible* iff it is q-invisible for all q ∈ *AP*- . If R is *AP*-invisible, we plainly say that R is *invisible*. *AP*- -invisibility and invisibility carry over to structural transitions. We sometimes refer to invisibility as *ordinary invisibility* for emphasis. Note that the set of invisible structural transitions I is no longer an under-approximation, but contains exactly those structural transitions t for which m <sup>t</sup> −→ <sup>m</sup> implies L(m) = L(m- ) (cf. Section 2).

We are now ready to introduce three orthogonal variations on invisibility. Firstly, relation R ⊆M×M is *reach* q*-invisible* [21] iff R ∩ (Mreach × Mreach ) is q-invisible, *i.e.*, all the pairs of reachable markings (m, m- ) ∈ R

agree on the labelling of q. Secondly, R is *value* q*invisible* if (i) q is polynomial and for all (m, m- ) ∈ R, f(m(p1),...,m(pn)) = f(m- (p1),...,m- (pn)); or if (ii) q is not polynomial and R is q-invisible. Intuitively, this means that the value of polynomial f never changes between two markings (m, m- ) ∈ R. Reach and value invisibility are lifted to structural transitions and sets of atomic propositions as before, *i.e.*, by taking R = {(m, m- ) <sup>|</sup> <sup>m</sup> <sup>t</sup> −→ <sup>m</sup>- } when considering invisibility of t. Finally, we introduce another way to lift invisibility to structural transitions: t is *strongly* q*-invisible* iff the set {(m, m- ) | ∀p ∈ P : m- (p) = m(p) + W(t, p) − W(p, t)} is q-invisible. Strong invisibility does not take the presence of a transition m <sup>t</sup> −→ <sup>m</sup> into account, and purely reasons about the effects of t. Value invisibility and strong in-

**Fig. 5:** Lattice of sets of invisible actions. Arrows represent a subset relation.

visibility are new in the current work, although strong invisibility was inspired by the notion of invisibility that is proposed by Varpaaniemi in [22].

We indicate the sets of all value, reach and strongly invisible structural transitions with <sup>I</sup>v, <sup>I</sup><sup>r</sup> and <sup>I</sup><sup>s</sup> respectively. Since <sup>I</sup><sup>v</sup> ⊆ I, <sup>I</sup><sup>s</sup> ⊆ I and I⊆I<sup>r</sup>, the set of all their possible combinations forms the lattice shown in Figure 5. In the remainder, the weak equivalence relations that follow from each of the eight invisibility notions are abbreviated, *e.g.*, ∼I<sup>r</sup> sv becomes <sup>∼</sup><sup>r</sup> sv.

**Fig. 6:** Two lattices containing variations of weak equivalence and stutter equivalence, respectively. Solid arrows indicate a subset relation inside the lattice; dotted arrows follow from the indicated theorems and show when the LSTS of a Petri net is labelled consistently.

*Example 3.* Consider again the Petri net and LSTS from Example 2. We can define q<sup>l</sup> and q<sup>p</sup> as linear and polynomial propositions, respectively:


This yields the state labelling which is shown in Example 2. 

Given a weak equivalence relation R<sup>∼</sup> and a stutter equivalence relation R-, we write R<sup>∼</sup> R to indicate that R<sup>∼</sup> and R yield consistent labelling. We spend the rest of this section investigating under which notions of invisibility and propositions from the literature, the LSTS of a Petri net is labelled consistently. More formally, we check for each weak equivalence relation R<sup>∼</sup> and each stutter equivalence relation R whether R<sup>∼</sup> R-. This tells us when existing stubborn set theory can be applied without problems. The two lattices containing all weak and stuttering equivalence relations are depicted in Figure 6; each dotted arrow represents a consistent labelling result. Before we continue, we first introduce an auxiliary lemma.

**Lemma 5.** *Let* I *be a set of invisible structural transitions and* L *some labelling function. If for all* <sup>t</sup> <sup>∈</sup> <sup>I</sup> *and paths* <sup>π</sup> <sup>=</sup> <sup>m</sup><sup>0</sup> <sup>t</sup><sup>1</sup> −→ <sup>m</sup><sup>1</sup> <sup>t</sup><sup>2</sup> −→ ... *and* <sup>π</sup>- = m<sup>0</sup> <sup>t</sup> −→ <sup>m</sup>- 0 t1 −→ m- 1 <sup>t</sup><sup>2</sup> −→ ... *, it holds that* <sup>π</sup> -<sup>L</sup> π- *, then* <sup>∼</sup><sup>I</sup> -L*.*

*Proof.* We assume that the following holds for all paths and t ∈ I:

$$m\_0 \xrightarrow{t\_1} m\_1 \xrightarrow{t\_2} \dots \xrightarrow{t\_2} m\_0 \xrightarrow{t} m'\_0 \xrightarrow{t\_1} m'\_1 \xrightarrow{t\_2} \dots \tag{\dagger}$$

We consider two initial paths π and π such that π ∼<sup>I</sup> π and prove that π -<sup>L</sup> π- . The proof proceeds by induction on the combined number of invisible structural transitions (taken from I) in π and π- . In the base case, π and π contain only visible structural transitions, and π ∼<sup>I</sup> π' implies π = π since Petri nets are deterministic. Hence, π -<sup>L</sup> π- .

For the induction step, we take as hypothesis that, for all initial paths π and π that together contain at most k invisible structural transitions, π ∼<sup>I</sup> π- implies π -<sup>L</sup> π- . Let π and π be two arbitrary initial paths such that π ∼<sup>I</sup> π- and the total number of invisible structural transitions contained in π and π is k. We consider the case where an invisible structural transition is introduced in π- , the other case is symmetric. Let π- = σ1σ<sup>2</sup> for some σ<sup>1</sup> and σ2. Let t ∈ I be some invisible structural transition and π-- = σ1tσ- <sup>2</sup> such that σ<sup>2</sup> and σ- <sup>2</sup> contain the same sequence of structural transitions. Clearly, we have π- ∼<sup>I</sup> π--. Here, we can apply our original assumption (†), to conclude that <sup>σ</sup><sup>2</sup> tσ- <sup>2</sup>, *i.e.*, the extra stuttering step t thus does not affect the labelling of the remainder of π--. Hence, we have π- -<sup>L</sup> π- and, with the induction hypothesis, π -<sup>L</sup> π--. Note that π and π-together contain k + 1 invisible structural transitions.

In case π and π together contain an infinite number of invisible structural transitions, π ∼<sup>I</sup> π implies π -<sup>L</sup> π follows from the fact that the same holds for all finite prefixes of π and πthat are related by ∼<sup>I</sup> . 

The following theorems each focus on a class of atomic propositions and show which notion of invisibility is required for the LSTS of a Petri net to be labelled consistently. In the proofs, we use a function dt, defined as dt(p) = W(t, p) − W(p, t) for all places p, which indicates how structural transition t changes the state. Furthermore, we also consider functions of type <sup>P</sup> <sup>→</sup> <sup>N</sup> as vectors of type N|<sup>P</sup> <sup>|</sup> . This allows us to compute the pairwise addition of a marking m with d<sup>t</sup> (m +dt) and to indicate that t does not change the marking (d<sup>t</sup> = 0).

**Theorem 4.** *Under reach value invisibility, the LSTS underlying a Petri net is labelled consistently for linear propositions,* i.e.*,* <sup>∼</sup><sup>r</sup> <sup>v</sup> l*.*

*Proof.* Let <sup>t</sup> ∈ I<sup>r</sup> <sup>v</sup> be a reach value invisible structural transition such that there exist reachable markings m and m with m <sup>t</sup> −→ <sup>m</sup>- . If such a t does not exist, then <sup>∼</sup><sup>r</sup> <sup>v</sup> is the reflexive relation and <sup>∼</sup><sup>r</sup> <sup>v</sup> <sup>l</sup> is trivially satisfied. Otherwise, let q := f(p1,...,pn) k be a linear proposition. Since t is reach value invisible and f is linear, we have f(m) = f(m- ) = f(m + dt) = f(m) + f(dt) and thus <sup>f</sup>(dt) = 0. It follows that, given two paths <sup>π</sup> <sup>=</sup> <sup>m</sup><sup>0</sup> <sup>t</sup><sup>1</sup> −→ <sup>m</sup><sup>1</sup> <sup>t</sup><sup>2</sup> −→ ... and π- = m<sup>0</sup> <sup>t</sup> −→ <sup>m</sup>- 0 <sup>t</sup><sup>1</sup> −→ <sup>m</sup>- 1 <sup>t</sup><sup>2</sup> −→ ... , the addition of <sup>t</sup> does not influence <sup>f</sup>, since f(mi) = f(mi) + f(dt) = f(m<sup>i</sup> + dt) = f(m- <sup>i</sup>) for all i. As a consequence, t also does not influence <sup>q</sup>. With Lemma 5, we deduce that <sup>∼</sup><sup>r</sup> <sup>v</sup> <sup>l</sup>. 

Whereas in the linear case one can easily conclude that π and π are stutter equivalent under f, in the polynomial case, we need to show that f is constant under all value invisible structural transitions t, even in markings where t is not enabled. This follows from the following proposition.

**Proposition 1.** *Let* <sup>f</sup> : <sup>N</sup><sup>n</sup> <sup>→</sup> <sup>Z</sup> *be a polynomial function,* a, b <sup>∈</sup> <sup>N</sup><sup>n</sup> *two constant vectors and* c = a − b *the difference between* a *and* b*. Assume that for all* <sup>x</sup> <sup>∈</sup> <sup>N</sup><sup>n</sup> *such that* <sup>x</sup> <sup>≥</sup> <sup>b</sup>*, where* <sup>≥</sup> *denotes pointwise comparison, it holds that* f(x) = f(x + c)*. Then,* f *is constant in the vector* c*,* i.e.*,* f(x) = f(x + c) *for all* <sup>x</sup> <sup>∈</sup> <sup>N</sup>n*.*

*Proof.* Let <sup>f</sup>, <sup>a</sup>, <sup>b</sup> and <sup>c</sup> be as above and let **<sup>1</sup>** <sup>∈</sup> <sup>N</sup><sup>n</sup> be the vector containing only ones. Given some arbitrary <sup>x</sup> <sup>∈</sup> <sup>N</sup><sup>n</sup>, consider the function <sup>g</sup>x(t) = <sup>f</sup>(<sup>x</sup> <sup>+</sup> <sup>t</sup> · **1** + c) − f(x + t · **1**). For sufficiently large t, it holds that x + t · **1** ≥ b, and it follows that gx(t) = 0 for all sufficiently large t. This can only be the case if g<sup>x</sup> is the zero polynomial, *i.e.*, gx(t) = 0 for all t. As a special case, we conclude that gx(0) = f(x + c) − f(x) = 0. 

The intuition behind this is that f(x + c) − f(x) behaves like the directional derivative of f with respect to c. If the derivative is equal to zero in infinitely many x, f must be constant in the direction of c. We will apply this result in the following theorem.

**Theorem 5.** *Under value invisibility, the LSTS underlying a Petri net is labelled consistently for polynomial propositions,* i.e.*,* <sup>∼</sup><sup>v</sup> p*.*

*Proof.* Let t ∈ I<sup>v</sup> be a value invisible structural transition, m and m two markings with m <sup>t</sup> −→ <sup>m</sup>- , and q := f(p1,...,pn) k a polynomial proposition. Note that infinitely many such (not necessarily reachable) markings exist in M, so we can apply Proposition 1 to obtain f(m) = f(m+dt) for all markings m. It follows that, given two paths <sup>π</sup> <sup>=</sup> <sup>m</sup><sup>0</sup> <sup>t</sup><sup>1</sup> −→ <sup>m</sup><sup>1</sup> <sup>t</sup><sup>2</sup> −→ ... and <sup>π</sup>- = m<sup>0</sup> <sup>t</sup> −→ <sup>m</sup>- 0 <sup>t</sup><sup>1</sup> −→ <sup>m</sup>- 1 <sup>t</sup><sup>2</sup> −→ ... , the addition of t does not alter the value of f, since f(mi) = f(m<sup>i</sup> +dt) = f(m- i) for all i. As a consequence, t also does not change the labelling of q. Application of Lemma 5 yields <sup>∼</sup><sup>v</sup> <sup>p</sup>. 

Varpaaniemi shows that the LSTS of a Petri net is labelled consistently for arbitrary propositions under his notion of invisibility [22, Lemma 9]. Our notion of strong visibility, and especially strong reach invisibility, is weaker than Varpaaniemi's invisibility, so we generalise the result to <sup>∼</sup><sup>r</sup> <sup>s</sup> -.

**Theorem 6.** *Under strong reach visibility, the LSTS underlying a Petri net is labelled consistently for arbitrary propositions,* i.e.*,* <sup>∼</sup><sup>r</sup> <sup>s</sup> -*.*

*Proof.* Let <sup>t</sup> ∈ I<sup>r</sup> <sup>s</sup> be a strongly reach invisible structural transition and π = <sup>m</sup><sup>0</sup> <sup>t</sup><sup>1</sup> −→ <sup>m</sup><sup>1</sup> <sup>t</sup><sup>2</sup> −→ ... and <sup>π</sup>- = m<sup>0</sup> <sup>t</sup> −→ <sup>m</sup>- 0 <sup>t</sup><sup>1</sup> −→ <sup>m</sup>- 1 <sup>t</sup><sup>2</sup> −→ ... two paths. Since, <sup>m</sup>- <sup>i</sup> = m<sup>i</sup> + d<sup>t</sup> for all i, it holds that either (i) d<sup>t</sup> = 0 and m<sup>i</sup> = m- <sup>i</sup> for all i; or (ii) each pair (mi, m- <sup>i</sup>) is contained in {(m, m- ) | ∀p ∈ P : m- (p) = m(p) + W(t, p) − W(p, t)}, which is the set that underlies strong reach invisibility of t. In both cases, L(mi) = L(m- <sup>i</sup>) for all <sup>i</sup>. It follows from Lemma 5 that <sup>∼</sup><sup>r</sup> <sup>s</sup> -. 

To show that the results of the above theorems cannot be strengthened, we provide two negative results.

**Theorem 7.** *Under ordinary invisibility, the LSTS underlying a Petri net is not necessarily labelled consistently for arbitrary propositions,* i.e.*,* ∼ -*.*

*Proof.* Consider the Petri net from Example 2 with the arbitrary proposition ql. Disregard q<sup>p</sup> for the moment. Structural transition t is ql-invisible, hence the paths corresponding to t1t2tt<sup>3</sup> and tt1t2t<sup>3</sup> are weakly equivalent under ordinary invisibility. However, they are not stutter equivalent. 

**Theorem 8.** *Under reach value invisibility, the LSTS underlying a Petri net is not necessarily labelled consistently for polynomial propositions,* i.e.*,* <sup>∼</sup><sup>r</sup> <sup>v</sup> p*.*

*Proof.* Consider the Petri net from Example 2 with the polynomial proposition q<sup>p</sup> := (1−p3)(1−p5) = 1 from Example 3. Disregard q<sup>l</sup> in this reasoning. Structural transition t is reach value qp-invisible, hence the paths corresponding to t1t2tt<sup>3</sup> and tt1t2t<sup>3</sup> are weakly equivalent under reach value invisibility. However, they are not stutter equivalent for polynomial propositions. 

It follows from Theorems 7 and 8 and transitivity of ⊆ that Theorems 4, 5 and 6 cannot be strengthened further. In terms of Figure 6, this means that the dotted arrows cannot be moved downward in the lattice of weak equivalences and cannot be moved upward in the lattice of stutter equivalences. The implications of these findings on related work will be discussed in the next section.

## **7 Related Work**

There are many works in literature that apply stubborn sets. We will consider several works that aim to preserve LTL−<sup>X</sup> and discuss whether they are correct when it comes to the problem presented in the current work.

Liebke and Wolf [10] present an approach for efficient CTL model checking on Petri nets. For some formulas, they can reduce CTL model checking to LTL model checking, which allows greater reductions under POR. They rely on the incorrect LTL preservation theorem, and since they apply the techniques on Petri nets with ordinary invisibility, their theory is incorrect (Theorem 7). Similarly, the overview of stubborn set theory presented by Valmari and Hansen in [21] applies reach invisibility and does not necessarily preserve LTL−<sup>X</sup>. Varpaaniemi [22] also applies stubborn sets to Petri nets, but relies on a visibility notion that is stronger than strong invisibility. The correctness of these results is thus not affected (Theorem 6). The approach of Bønneland *et al.* [4] operates on two-player Petri nets, but only aims to preserve reachability and consequently does not suffer from the inconsistent labelling problem.

A generic implementation of weak stubborn sets is proposed by Laarman *et al.* [9]. They use abstract concepts such as guards and transition groups to implement POR in a way that is agnostic of the input language. The theory they present includes condition **D1**, which is too weak, but the accompanying implementation follows the framework of Section 4.1, and thus it is correct by Theorem 2 The implementations proposed in [21,23] are similar, albeit specific for Petri nets.

Others [6,8] perform action-based model checking and thus strive to preserve weak trace equivalence or inclusion. As such, they do not suffer from the problems discussed here, which applies only to state labels.

Although Beneˇs *et al.* [2,3] rely on ample sets, and not on stubborn sets, they also discuss weak trace equivalence and stutter-trace equivalence. In fact, they present an equivalence relation for traces that is a combination of weak and stutter equivalence. The paper includes a theorem

that weak equivalence implies their new state/event equivalence [2, Theorem 6.5]. However, the counterexample on the right shows that this consistent labelling theorem does not hold. Here, the action τ is invisible, and the two paths in this transition system are thus weakly equivalent. However, they are not stutter equivalent, which is a special case of state/event equivalence. Although the main POR correctness result [2,

Corollary 6.6] builds on the incorrect consistent labelling theorem, its correctness does not appear to be affected. An alternative proof can be constructed based on Lemmas 1 and 2.

The current work is not the first to point out mistakes in POR theory. In [14], Siegel presents a flaw in an algorithm that combines POR and on-the-fly model checking [12]. In that setting, POR is applied on the product of an LSTS and a B¨uchi automaton. Let q be a state of the LSTS and s a state of the B¨uchi automaton. While investigating a transition (q, s) <sup>a</sup> −→ (q- , s- ), condition **C3**, which like condition **L**—aims to solve the action ignoring problem, incorrectly sets r(q, s- ) = *enabled*(q) instead of r(q, s) = *enabled*(q).

## **8 Conclusion**

We discussed the inconsistent labelling problem for preservation of stutter-trace equivalence with stubborn sets. The issue is relatively easy to repair by strengthening condition **D1**. For Petri nets, altering the definition of invisibility can also resolve inconsistent labelling depending on the type of atomic propositions. The impact on applications presented in related works seems to be limited: the problem is typically mitigated in the implementation, since it is very hard to compute **D1** exactly. This is also a possible explanation for why the inconsistent labelling problem has not been noticed for so many years.

Since this is not the first error found in POR theory [14], a more rigorous approach to proving its correctness, *e.g.* using proof assistants, would provide more confidence.

## **References**

1. Baier, C., Katoen, J.P.: Principles of model checking. MIT Press (2008)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Semantical Analysis of Contextual Types**

Brigitte Pientka<sup>1</sup> and Ulrich Sch¨opp2(-)

<sup>1</sup> McGill University, Montreal, Canada, bpientka@cs.mcgill.ca <sup>2</sup> fortiss GmbH, Munich, Germany, schoepp@fortiss.org

**Abstract.** We describe a category-theoretic semantics for a simply typed variant of Cocon, a contextual modal type theory where the box modality mediates between the weak function space that is used to represent higher-order abstract syntax (HOAS) trees and the strong function space that describes (recursive) computations about them. What makes Cocon different from standard type theories is the presence of first-class contexts and contextual objects to describe syntax trees that are closed with respect to a given context of assumptions. Following M. Hofmann's work, we use a presheaf model to characterise HOAS trees. Surprisingly, this model already provides the necessary structure to also model Cocon. In particular, we can capture the contextual objects of Cocon using a comonad that restricts presheaves to their closed elements. This gives a simple semantic characterisation of the invariants of contextual types (e.g. substitution invariance) and identifies Cocon as a type-theoretic syntax of presheaf models. We express our category-theoretic constructions by using a modal internal type theory that is implemented in Agda-Flat.

## **1 Introduction**

A fundamental question when defining, implementing, and working with languages and logics is: How do we represent and analyse syntactic structures? Higher-order abstract syntax [19] (or lambda-tree syntax [17]) provides a deceptively simple answer to this question. The basic idea to represent syntactic structures is to map uniformly binding structures in our object language (OL) to the function space in a meta-language thereby inheriting α-renaming and capture-avoiding substitution. In the logical framework LF [10], for example, we can define a small functional programming language consisting of functions, function application, and let-expressions using a type tm as follows:

lam : (tm → tm) → tm. letv: tm → (tm → tm) → tm. app : tm → tm → tm.

The object-language term (lam x. lam y. let w = x y in w y) is then encoded as lam λx.lam λy.letv (app x y) λw.app w y using the LF abstractions to model binding. Object-level substitution is modelled through LF application; for instance, the fact that ((lam x.M) N) reduces to [N/x]M in our object language is expressed as (app (lam M) N) reducing to (M N).

This approach is elegant and can offer substantial benefits: we can treat objects equivalent modulo renaming and do not need to define object-level substitution.

However, we not only want to just construct HOAS trees, but also to analyse them and to select sub-trees. This is challenging, as sub-trees are context sensitive. For example, the term letv (app x y) λw.app w y only makes sense in a context x:tm,y:tm. Moreover, one cannot simply extend LF to allow syntax analysis. If one simply added a recursion combinator to LF, then it could be used to define many functions M: tm <sup>→</sup> tm for which lam M would not represent an object-level syntax term [12].

Contextual types [18,20] offer a type-theoretic solution to these problems by reifying the typing judgement, i.e. that letv (app x y) λw.app w y has type tm in the context x:tm,y:tm, as a *contextual type* <sup>x</sup>:tm, y:tm tm. The contextual type <sup>x</sup>:tm, y:tm tm describes a set of terms of type tm that may contain variables x and y. In particular, the contextual object x, y letv (app x y) λw.app w y has the given contextual type. By abstracting over contexts and treating contexts as first-class, we can now recursively analyse HOAS trees [20,25,21]. Recently, [23] further generalised these ideas and presented a contextual modal type theory, Cocon, where we can mix HOAS trees and computations, i.e. we can use (recursive) computations to analyse and traverse (contextual) HOAS trees and we can embed computations within HOAS trees. This line of work provides a syntactic perspective to the question of how to represent and analyse syntactic structures with binders, as it focuses on decidability of type checking and normalisation. However, its semantics remains not well-understood. What is the semantic meaning of a contextual type? Can we semantically justify the given induction principles? What is the semantics of a first-class context?

While a number of closely related categorical models of abstract syntax with bindings [12,8,9] were proposed around 2000, the relationship of these models to concrete type-theoretic languages for computing with HOAS structures was teneous. In this paper, we give a category-theoretic semantics for Cocon (for simply-typed HOAS). This provides semantic perspective of contextual types and first-class contexts. Maybe surprisingly, the presheaf model introduced by Hofmann [12] already provides the necessary structure to also model contextual modal type theory. Besides the standard structure of this model, we only need two additional concepts: a -modality and a cartesian closed universe of representables. For simplicity and lack of space, we focus on the special case of Cocon where the HOAS trees are simply-typed. Concentrating on the simply-typed setting allows us to introduce the main idea without the additional complexity that type dependencies bring with them. We outline the dependently-typed case in Sec. 6.

Our work provides a semantic foundation to Cocon and can serve as a starting point to investigate connections to similar work. First, our work connects Cocon to other work on internal languages for presheaf categories with a -modality, such as spatial type theory [27] or crisp type theory [16]. Second, it may help to understand the relations of Cocon to type theories that use a modality for metaprogramming and intensional recursion, such as [15]. While Cocon is built on the same general ideas, a main difference seems to be that Cocon distinguishes between HOAS trees and computations, even though it allows mixed use of them. We hope to clarify the relation by providing a semantical perspective.

## **2 Presheaves for Higher-Order Abstract Syntax**

Our work begins with the presheaf models for HOAS of [12,8]. The key idea of those approaches is to integrate substitution-invariance in the computational universe in a controlled way. For the representation of abstract syntax, one wants to allow only substitution-invariant constructions. For example, lam M represents an object-level abstraction if and only if M is a function that uses its argument in a substitution-invariant way. For computation with abstract syntax, on the other hand, one wants to allow non-substitution-invariant constructions too. Presheaf categories allow one to choose the desired amount of substitution-invariance.

Let <sup>D</sup> be a small category. The presheaf category <sup>D</sup> is defined to be the category Set<sup>D</sup>op . Its objects are functors <sup>F</sup> : <sup>D</sup>op <sup>→</sup> Set, which are also called *presheaves*. Such a functor F is given by a set F(Ψ) for each object Ψ of D together with a function F(σ): F(Φ) → F(Ψ) for any object Φ and σ : Ψ → Φ in D, subject to the functor laws. The intuition is that F defines sets of elements in various <sup>D</sup>-contexts, together with a <sup>D</sup>-substitution action. A morphism <sup>f</sup> : <sup>F</sup> <sup>→</sup> <sup>G</sup> is a natural transformation, which is a family of functions f<sup>Ψ</sup> : F(Ψ) → G(Ψ) for any Ψ. This family of functions must be natural, i.e. commute with substitution f<sup>Ψ</sup> ◦ F(σ) = F(σ) ◦ fΦ.

For the purposes of modelling higher-order abstract syntax, D will typically be the term model of some domain-level lambda-calculus. By domain-level, we mean the calculus that serves as the meta-level for object-language encodings. It is the calculus that contains constants like lam and app from the Introduction. We call it domain-level to avoid possible confusion between different meta-levels later. For simplicity, let us for now use a simply-typed lambda-calculus with functions and products as the domain language. It is sufficient to encode the example from the Introduction and allows us to explain the main idea underlying our approach.

The term model of the simply-typed domain-level lambda-calculus forms a cartesian closed category D. The objects of D are contexts x1: A1,...,xn: A<sup>n</sup> of simple types. We use Φ and Ψ to range over such contexts. A morphism from x1: A1,...,xn: A<sup>n</sup> to x1: B1,...,xm: B<sup>m</sup> is a tuple (t1,...,tm) of terms <sup>x</sup>1: <sup>A</sup>1,...,xn: <sup>A</sup><sup>n</sup> <sup>t</sup><sup>i</sup> : <sup>B</sup><sup>i</sup> for <sup>i</sup> = 1,...,m. A morphism of type <sup>Ψ</sup> <sup>→</sup> <sup>Φ</sup> in <sup>D</sup> thus amounts to a (domain-level) substitution that provides a (domain-level) term in context Ψ for each of the variables in Φ. Terms are identified up to αβη-equality. One may achieve this by using a de Bruijn encoding, for example, but the specific encoding is not important for this paper. The terminal object is the empty context, which we denote by 1, and the product Φ × Ψ is defined by context concatenation. It is not hard to see that any object x1: A1,...,xn: A<sup>n</sup> is isomorphic to an object that is given by a context with a single variable, namely x1:(A<sup>1</sup> ×···× An). This is to say that contexts can be identified with product types. In view of this isomorphism, we shall allow ourselves to consider the objects of D also as types and vice versa. The category D is cartesian closed, the exponential of Φ and Ψ being given by the function type Φ → Ψ (where the objects are considered as types).

The presheaf category <sup>D</sup> is a computational universe that both embeds the term model D and that can represent computations about it. Note that we cannot

just enrich D with terms for computations if we want to use HOAS. In a simplytyped lambda-calculus with just the constant terms app: tm <sup>→</sup> tm <sup>→</sup> tm and lam: (tm → tm) → tm, each term of type tm represents an object-level term. This would not be the true anymore, if we were to allow computations in the domain language, since one could define <sup>M</sup> to be something like (λx. **if** x represents an object-level application **then** M1 **else** M2) for distinct M1 and M2. In this case, lam M would not represent an object-level term anymore. If we want to preserve a bijection between the object-level terms and their representations in the domain-language, we cannot allow case-distinction over whether a term represents an object-level an application.

The category <sup>D</sup> unites syntax with computations by allowing one to enforce various degrees of substitution-invariance. By choosing objects with different substitution actions, one can control the required amount of substitution-invariance.

In one extreme, a set S can be represented by the constant presheaf ΔS with ΔS(Ψ) = S and ΔS(σ) = id for all Ψ and σ. The substitution action is trivial. As a consequence, a morphism ΔS → ΔT amounts to a function from set S to set T, since the trivial choice of the substitution action makes the naturality condition vacuous.

The Yoneda embedding represents the other extreme. For any object Φ of D, the presheaf y(Φ): <sup>D</sup>op <sup>→</sup> Set is defined by y(Φ)(Ψ) = <sup>D</sup>(Ψ,Φ), which is the set of morphisms from Ψ to Φ in D. The functor action is pre-composition. The presheaf y(Φ) should be understood as the type of all domain-level substitutions with codomain Φ. An important example is Tm := y(tm). In this case, Tm(Ψ) is the set of all morphisms of type <sup>Ψ</sup> <sup>→</sup> tm in <sup>D</sup>. By the definition of <sup>D</sup>, these correspond to domain-level terms of type tm in context Ψ. In this way, the presheaf Tm represents the domain-level terms of type tm.

The Yoneda embedding does in fact embed <sup>D</sup> into <sup>D</sup> fully and faithfully. The Yoneda embedding becomes a functor y : <sup>D</sup> <sup>→</sup> <sup>D</sup> if one defines the morphism action to be post-composition. This means that y maps a morphism σ : Ψ → Φ in <sup>D</sup> to the natural transformation y(σ): y(Ψ) <sup>→</sup> y(Φ) that is defined by postcomposing with <sup>σ</sup>. This definition makes y into a functor y : <sup>D</sup> <sup>→</sup> <sup>D</sup> that is moreover full and faithful: its action on morphisms is a bijection from D(Ψ,Φ) to <sup>D</sup>-(y(Ψ), y(Φ)) for any Ψ and Φ. This is because a natural transformation f : y(Ψ) → y(Φ) is, by naturality, uniquely determined by f<sup>Ψ</sup> (id), where id ∈ D(Ψ,Ψ) = y(Ψ)(Ψ), and f<sup>Ψ</sup> (id) is an element of y(Φ)(Ψ) = D(Ψ,Φ).

Since <sup>D</sup> embeds into <sup>D</sup> fully and faithfully, the term model of the domain language is available in <sup>D</sup>-. Consider for example Tm = y(tm). Since y is full and faithful, the morphisms from Tm to Tm in <sup>D</sup> are in one-to-one correspondence with the morphisms from tm to tm in D. These, in turn, are defined to be substitutions and correspond to simply-typed (domain-level) lambda terms with one free variable. This shows that substitution invariance cuts down the morphisms from Tm to Tm in <sup>D</sup>just as much as one would like for HOAS encodings.

But <sup>D</sup> contains not just a term model of the domain language. It can also represent computations about the domain-level syntax and computations that are not substitution-invariant. For example, arbitrary functions on terms can be represented as morphisms from the constant presheaf Δ(Tm(1)) to Tm. Recall that 1 is the empty context, so that Tm(1) is the set D(1, tm), by definition, which is isomorphic to the set of closed domain-level terms of type tm. The morphisms from <sup>Δ</sup>(Tm(1)) to Tm in <sup>D</sup> correspond to arbitrary functions from closed terms to closed terms, without any restriction of substitution invariance.

The restriction to the constant presheaf of closed terms can be generalised to arbitrary presheaves. Define a functor : <sup>D</sup>- <sup>→</sup> <sup>D</sup> by letting F be the constant presheaf Δ(F(1)), i.e. F(Ψ) = F(1) and F(σ) = id. Thus, restricts any presheaf to the set of its closed elements. The functor defines a comonad where the counit ε<sup>F</sup> : F → F is the obvious inclusion and the comultiplication ν<sup>F</sup> : F → F is the identity. The latter means that the comonad is idempotent.

## **3 Internal Language**

To explain how <sup>D</sup> models higher-order abstract syntax and contextual types, we need to expose more of its structure. Most of this structure is standard. Defining it directly in terms of functors and natural transformations is somewhat laborious and the technical details may obscure the basic idea of our approach.

We therefore use the internal type theory of <sup>D</sup> as a meta-language for working with its structure. The structure of <sup>D</sup> furnishes a model of a dependent type theory that supports dependent products, dependent sums and extensional identity types, among others, in a standard way [11]. We use Agda notation for the types and terms of this internal type theory. We write (x: S) → T for a dependent function type and write x: S.m and m n for the associated lambda-abstractions and applications. As usual, we will sometimes also write S → T for (x: S) → T if x does not appear in T. However, to make it easier to distinguish the function spaces at various levels, we will write (x: S) → T by default even when x does not appear in T. We use let x = m in n as an abbreviation for (x: T.n) m, as usual. For two terms m: T and n: T, we write m =<sup>T</sup> n or just m = n for the associated identity type. Our notation is similar to Agda's, since the internal type theory can be seen as a fragment of Agda's type theory. Agda has been useful as a tool for type-checking our constructions in the internal type theory [1].

In the spirit of Martin-L¨of type theory, we will define basic types and terms successively as they are needed. In the Agda development this corresponds to postulating constants that are justified by the interpretation in <sup>D</sup>-. In the following sections, we will expose the structure of <sup>D</sup> step by step until we have enough to interpret contextual types.

While much of the structure of <sup>D</sup> can be captured by adding rules and constants to standard Martin-L¨of type theory, for the comonad such a formulation would not be very satisfactory. The issues are discussed by Shulman [27, p.7], for example. To obtain a more satisfactory syntax for the comonad, we refine the internal type theory into a modal type theory in which appears as a necessity modality. This approach goes back to [3,4,6] and is also used by recent work of Shulman [27], Licata et al. [16] and others on working with the -modality in type theory. Agda has recently gained support for such a -modality [29].

We summarise here the typing rules for the -modality which we will rely on. To control the modality, one uses two kinds of variables. In addition to standard variables x: T, one has a second kind of so-called *crisp* variables x::T. Typing judgements have the form Δ | Θ m: T, where Δ collects the crisp variables and Θ collects the ordinary variables. In essence, a crisp variable x::T represents an assumption of the form x:T. The syntactic distinction is useful, since it leads to a type theory that is well-behaved with respect to substitution, see [6,27].

The typing rules are closely related to those in modal type systems [6,18], where Δ is the typing context for modal (global) assumptions and Θ for (local) assumptions, and type systems for linear logic [4], where Δ is the typing context for non-linear assumptions and Θ for linear assumptions.

Δ, u::T,Δ- | Θ u: T Δ | Θ, x: T,Θ- x: T Δ |· m : T Δ | Θ box m : -T Δ | Θ m : -T Δ, x::T | Θ n : S Δ | Θ let box x = m in n : S

Given any term m: T which only depends on modal variable context Δ, we can form the term box m: T. We have a let-term let box x = m in n that takes a term m: T and binds it to a variable x::T. The rules maintain the invariant that the free variables in a type T or a term box m are all crisp variables from the crisp context Δ.

The other typing rules do not modify the crisp context. For examples, the rules for dependent products are:

$$\frac{\Delta \mid \Theta, x \colon T \vdash m \colon S}{\Delta \mid \Theta \vdash \lambda x \colon T.m : (x \colon T) \to S} \quad \frac{\Delta \mid \Theta \vdash m \colon (y \colon T) \to S \quad \Delta \mid \Theta \vdash n \colon T}{\Delta \mid \Theta \vdash m \colon n \colon [n/y]S}$$

When Δ is empty, we shall write just Θ m: T for Δ | Θ m: T.

## **4 From Presheaves to Contextual Types**

Armed with the internal type theory, we can now explore the structure of <sup>D</sup>-.

#### **4.1 A Universe of Representables**

For our purposes, the main feature of <sup>D</sup> is that it embeds D fully and faithfully via the Yoneda embedding. In the type theory for <sup>D</sup>-, we may capture this embedding by means of a Tarski-style universe. Such a universe is defined by a type of codes for types together with a decoding function that maps codes to actual types.

The type of codes Obj represents the set of objects of D in the internal type theory of <sup>D</sup>-. We have seen above that any set can be represented as a presheaf with trivial substitution action, and Obj is one such example. Particular objects of D then appear as terms of type Obj. The cartesian closed structure of D gives us terms unit, times, arrow for the terminal object 1, finite products <sup>×</sup> and the exponential (function type). We also have a term for the domain-level type tm.

 Obj type tm: Obj times: (a: Obj) <sup>→</sup> (b: Obj) <sup>→</sup> Obj unit: Obj arrow: (a: Obj) <sup>→</sup> (b: Obj) <sup>→</sup> Obj Subsequently, we sometimes talk about objects of D when we intend to describe terms of type Obj (and vice versa).

The morphisms of D could similarly be encoded as a constant presheaf with many term constants, but this is in fact not necessary. Instead, we can use the Yoneda embedding as a function that decodes elements of Obj into actual types.

$$x \colon \mathbf{0b} \mathbf{j} \vdash \mathbf{E1} \, x \text{ type}$$

The function El is almost direct syntax for the Yoneda embedding. The interpretation in <sup>D</sup> is such that, for any object A of D, the type El A is interpreted by the presheaf y(A). Such a presheaf is called *representable*. One can think of El A as the type of all morphisms of type <sup>Ψ</sup> <sup>→</sup> <sup>A</sup> in <sup>D</sup> for arbitrary <sup>Ψ</sup>. Recall from above that a morphism of type <sup>Ψ</sup> <sup>→</sup> <sup>A</sup> in <sup>D</sup> amounts to a domain-level term of type A that may refer to variables in Ψ. In this sense, one should think of El A as a type of domain-level terms of type A, both closed and open ones.

We get all morphisms of D, and no more, in this way, since the Yoneda embedding is full and faithful, recall Sec. 2. In our case, this means that the type (x: El <sup>A</sup>) <sup>→</sup> El <sup>B</sup> represents the morphisms of type <sup>A</sup> <sup>→</sup> <sup>B</sup> in <sup>D</sup>. Any closed term of type (<sup>x</sup> : El <sup>A</sup>) <sup>→</sup> El <sup>B</sup> corresponds to such a morphism and vice versa. This is because the naturality requirements in <sup>D</sup> enforce substitution-invariance, as outlined in Sec. 2. The type (<sup>x</sup> : El <sup>A</sup>) <sup>→</sup> El <sup>B</sup> thus does not represent arbitrary functions from terms of type A to terms of type B, but only substitution-invariant ones. If a function of this type maps a domain-level variable x: A (encoded as an element of El A) to some term M: B (encoded as an element of El B), then it must map any other N: A to [N/x]M.

We note that the type dependency in El is easy to work with. A term of type (a: Obj) <sup>→</sup> (b: Obj) <sup>→</sup> (x: El <sup>a</sup>) <sup>→</sup> El <sup>b</sup> corresponds to a family of terms (x: El <sup>A</sup>) <sup>→</sup> El <sup>B</sup> indexed by objects <sup>A</sup> and <sup>B</sup> in <sup>D</sup>. This is because Obj is just a set, so that the naturality constraints of <sup>D</sup>are vacuous for functions out of Obj.

To summarise, we get access to <sup>D</sup> in the internal type theory of <sup>D</sup> simply by considering the Yoneda embedding as the decoding function El of a universe ´a la Tarski. Since is consists of the representable presheaves, we call it the *universe of representables*. The following lemmas state that the embedding preserves terminal object, binary products and the exponential.

**Lemma 1.** *The internal type theory of* <sup>D</sup> *has a term* terminal: El unit*, such that* x = terminal *holds for any* x: El unit*.*

**Lemma 2.** *The internal type theory of* <sup>D</sup> *justifies the terms below, such that* fst (pair x y) = x*,* snd (pair x y) = y*,* z = pair (fst z) (snd z) *for all* x, y, z*.* <sup>c</sup>: Obj, d: Obj fst: (<sup>z</sup> : El (times c d)) <sup>→</sup> El <sup>c</sup> <sup>c</sup>: Obj, d: Obj snd: (<sup>z</sup> : El (times c d)) <sup>→</sup> El <sup>d</sup> <sup>c</sup>: Obj, d: Obj pair: (<sup>x</sup> : El <sup>c</sup>) <sup>→</sup> (<sup>y</sup> : El <sup>d</sup>) <sup>→</sup> El (times c d)

**Lemma 3.** *The internal type theory of* <sup>D</sup> *justifies the terms below such that* arrow-i (arrow-e f) = f *and* arrow-e (arrow-i g) = g *for all* f,g*.*

<sup>c</sup>: Obj, d: Obj arrow-e: (x: El (arrow c d)) <sup>→</sup> (y: El <sup>c</sup>) <sup>→</sup> El <sup>d</sup> <sup>c</sup>: Obj, d: Obj arrow-i: (y: (El <sup>c</sup> <sup>→</sup> El <sup>d</sup>)) <sup>→</sup> El (arrow c d)

#### **4.2 Higher-Order Abstract Syntax**

The last lemma in the previous section states that El <sup>A</sup> <sup>→</sup> El <sup>B</sup> is isomorphic to El (arrow A B). This is particularly useful to lift HOAS-encodings from D to <sup>D</sup>-. For instance, the domain-level term constant lam: (tm <sup>→</sup> tm) <sup>→</sup> tm gives rise to an element of El (arrow (arrow tm tm) tm). But this type is isomorphic to (El tm <sup>→</sup> El tm) <sup>→</sup> El tm, by the lemma.

This means that the higher-order abstract syntax constants lift to <sup>D</sup>-:

```
app: (m: El tm) → (n: El tm) → El tm lam: (m: (El tm → El tm)) → El tm
```
Once one recognises El A as y(A), the adequacy of this higher-order abstract syntax encoding lifts from <sup>D</sup> to <sup>D</sup> as in [12]. For example, an argument M to lam has type El tm <sup>→</sup> El tm, which is isomorphic to El (arrow tm tm). But this type represents (open) domain-level terms <sup>t</sup>: tm <sup>→</sup> tm. The term lam <sup>M</sup> : El tm then represents the domain-level term lam t: tm, so it just lifts the domain-level.

#### **4.3 Closed Objects**

One should think of T as the type of 'closed' elements of T. In particular, (El <sup>A</sup>) represents morphisms of type 1 <sup>→</sup> <sup>A</sup> in <sup>D</sup>, recall the definition of from Sec. 2 and that El A corresponds to yA. In the term model D, the morphisms <sup>1</sup> <sup>→</sup> <sup>A</sup> correspond to closed domain-language terms of type <sup>A</sup>. Thus, while El <sup>A</sup> represents both open and closed domain-level terms, (El A) represents only the closed ones.

This applies also to the type El <sup>A</sup> <sup>→</sup> El <sup>B</sup>. We have seen above that El <sup>A</sup> <sup>→</sup> El B is isomorphic to El (arrow A B) and may therefore be thought of as containing the terms of type B with a distinguished variable of type A. But, these terms may contain other free domain language variables. The type (El <sup>A</sup> <sup>→</sup> El <sup>B</sup>), on the other hand, contains only terms of type B that may contain (at most) one variable of type A.

Restricting to closed object with the modality is useful because it disables substitution-invariance. For example, the internal type theory for <sup>D</sup> justifies a function is-lam: (x:(El tm)) <sup>→</sup> bool that returns true if and only if the argument represents a domain language lambda abstraction. We shall define it in the next section. Such a function cannot be defined with type El tm <sup>→</sup> bool, since it would not be invariant under substitution. Its argument ranges over terms that may be open; which particularly includes domain-level variables. The function would have to return false for them, since a domain-level variable is not a lambda-abstraction. But after substituting a lambda-abstraction for the variable, it would have to return true, so it could not be substitution-invariant.

We note that the type Obj consists only of closed elements and that Obj and Obj happen to be definitionally equal types (an isomorphism would suffice, but equality is more convenient).

## **4.4 Contextual Objects**

Using function types and the modality, it is now possible to work with contextual objects that represent domain level terms in a certain context, much like in [20,21]. A contextual type -<sup>Ψ</sup> <sup>A</sup> is a boxed function type of the form (El <sup>Ψ</sup> <sup>→</sup> El <sup>A</sup>). It represents domain-level terms of type A with variables from Ψ. Here, we consider the domain-level context Ψ as a term that encodes it. The interpretation will make this precise.

For example, domain-level terms with up to two free variables now appear as terms of type (El ((times (times unit tm) tm) <sup>→</sup> El tm), as the following example illustrates.

```
box (-
      u: El ((times (times unit tm) tm). let x1 = snd (fst u) in
                                          let x2 = snd u in
                                            app (lam (-
                                                         x: El tm. app x1 x)) x2 )
```
The context variables x<sup>1</sup> and x<sup>2</sup> are bound at the meta level.

This representation integrates substitution as usual. For example, given crisp variables <sup>m</sup>::El (times <sup>c</sup> tm) <sup>→</sup> tm and <sup>n</sup>::El <sup>c</sup> <sup>→</sup> tm for contextual terms, the term box (u: El c. m (pair u (n u))) represents substitution of n for the last variable in the context of m.

For working with contextual objects, it is convenient to lift the constants app and lam to contextual types.

<sup>c</sup>: Obj app- : (El <sup>c</sup> <sup>→</sup> El tm) <sup>→</sup> (El <sup>c</sup> <sup>→</sup> El tm) <sup>→</sup> (El <sup>c</sup> <sup>→</sup> tm) <sup>c</sup>:Obj lam- : (El (times <sup>c</sup> tm) <sup>→</sup> El tm) <sup>→</sup> (El <sup>c</sup> <sup>→</sup> El tm)

These terms are defined by:

```
app-
    := -
          m, n. let box m-
                           = m in let box n-
                                                = n in
                box (-
                       u: El c. app (m-
                                       u) (n-
                                              u))
lam-
    := -
          m. let box m-
                         = m in box (-
                                         u: El c. lam (-
                                                       x: El tm. m-
                                                                    (pair u x)))
```
A contextual type for domain-level variables (as opposed to arbitrary terms) can be defined by restricting the function space in (El <sup>Ψ</sup> <sup>→</sup> El <sup>A</sup>) to consist only of projections. Projections are functions of the form snd ◦ fstk, where we write fst<sup>k</sup> for the <sup>k</sup>-fold iteration fst ◦··· ◦ fst. Let us write <sup>S</sup> <sup>→</sup><sup>v</sup> <sup>T</sup> for the subtype of S → T consisting only of projections. The contextual type (El <sup>Ψ</sup> <sup>→</sup><sup>v</sup> El <sup>A</sup>) is then a subtype of (El <sup>Ψ</sup> <sup>→</sup> El <sup>A</sup>).

With these definitions, we can express a primitive recursion scheme for contextual types. We write it in its general form where the result type A can possibly depend on x. This is only relevant for the dependently typed case; in the simply typed case, the only dependency is on c.

**Lemma 4.** *Let* <sup>c</sup>: Obj, x:(El <sup>c</sup> <sup>→</sup> El tm) Acx *type and define:*

<sup>X</sup>var := (c: Obj) <sup>→</sup> (x:-(El <sup>c</sup> <sup>→</sup><sup>v</sup> El tm)) <sup>→</sup> Acx <sup>X</sup>app := (c: Obj) <sup>→</sup> (x, y:-(El <sup>c</sup> <sup>→</sup> El tm)) <sup>→</sup> Acx <sup>→</sup> Acy <sup>→</sup> A c (app x y) <sup>X</sup>lam := (c: Obj) <sup>→</sup> (x:-(El (times <sup>c</sup> tm) <sup>→</sup> El tm)) <sup>→</sup> <sup>A</sup> (times <sup>c</sup> tm) <sup>x</sup> <sup>→</sup> A c (lamx) *Then,* <sup>D</sup>*justifies a term*

 rec: <sup>X</sup>var <sup>→</sup> <sup>X</sup>app <sup>→</sup> <sup>X</sup>lam <sup>→</sup> (c: Obj) <sup>→</sup> (x:(El <sup>c</sup> <sup>→</sup> El tm)) <sup>→</sup> Acx *such that the following equations are valid.*

rec <sup>t</sup>var <sup>t</sup>app <sup>t</sup>lam c x <sup>=</sup> <sup>t</sup>var c x *if* <sup>x</sup>:(El <sup>c</sup> <sup>→</sup><sup>v</sup> El tm) rec tvar tapp tlam c (app s t) = tapp cst rec tvar tapp tlam c (lams) = tlam c s

*Proof (outline).* To outline the proof idea, note first that a function of type (c: Obj) <sup>→</sup> (x:(El <sup>c</sup> <sup>→</sup> El tm)) <sup>→</sup> Acx in <sup>D</sup>-, corresponds to an inhabitant of AΦt for each concrete object <sup>Φ</sup> of <sup>D</sup> and each inhabitant <sup>t</sup>: (El <sup>Φ</sup> <sup>→</sup> El tm). This is because naturality constraints for boxed types are vacuous (and Obj = Obj). Next, note that inhabitants of (El <sup>Φ</sup> <sup>→</sup> El tm) correspond to domain-level terms of type tm in context Φ up to αβη-equality. We can perform a case-distinction on whether it is a variable, abstraction or application and depending on the result use tvar, tapp or tlam to define the required inhabitant of AΦt.

As a simple example for rec, we can define the function is-lam discussed above by rec (c, x. false) (c, x, y, rx, ry. false) (c, x, rx. true).

## **5 Simple Contextual Modal Type Theory**

We have outlined informally how the internal dependent type theory of <sup>D</sup> can model contextual types. In this section, we make this precise by giving the interpretation of Cocon [23], a contextual modal type theory where we can work with contextual HOAS trees and computations about them, into <sup>D</sup>-. We will focus here on a simply-typed version of Cocon where we use a simply-typed domain-language with constants app and lam and also only allow computations about HOAS trees, but do not consider, for example, universes. Concentrating on a stripped down, simply-typed version of Cocon allows us to focus on the essential aspects, namely how to interpret domain-level contexts and domain-level contextual objects and types semantically. The generalisation to a dependently typed domain-level such as LF in Sec. 6 will be conceptually straightforward, although more technical. Handling universes is an orthogonal issue (see also [16]).

We first define our simply-typed domain-level with the type tm the term constants lam and app (see Fig. 1). Following Cocon, we allow computations to be embedded into domain-level terms via unboxing. The intuition is that if a program t promises to compute a value of type <sup>x</sup>:tm, y:tm tm, then we can embed <sup>t</sup> directly into a domain-level object writing lam λx.lam λy.app <sup>t</sup> <sup>x</sup>, unboxing t. Domain-level objects (resp. types) can be packaged together with their domain-level context to form a contextual object (resp. type). Domain-level contexts are formed as usual, but may contain context variables to describe a yet unknown prefix. Last, we include domain-level substitutions that allow us to move between domain-level contexts. The compound substitution σ, M extends the substitution <sup>σ</sup> with domain <sup>Ψ</sup> to a substitution with domain Ψ,x - , where <sup>M</sup> replaces <sup>x</sup>. Following [18,23], we do not store the domain (like <sup>Ψ</sup>-) in the


**Fig. 1.** Syntax of Cocon with a fixed simply-typed domain tm

substitution, it can always be recovered before applying the substitution. We also include *weakening substitution*, written as wkΨ-, to describe the weakening of the domain Ψ to Ψ, −−→ x:A. Weakening substitutions are necessary, as they allow us to express the weakening of a context variable ψ. Identity is a special form of the wkΨ substitution, which follows immediately from the typing rule of wkΨ-. Composition is admissible.

We summarise the typing rules for domain-level terms and types in Fig. 2. We also include typing rules for domain-level contexts. Note that since we restrict ourselves to a simply-typed domain-level, we simply check that A is a well-formed type. We defer the reduction and expansion rules to the appendix and only remark here that equality for domain-level terms and substitution is modulo βη. In particular, -Φ-N <sup>σ</sup> reduces to [σ]N.

In our grammar, we distinguish between the contextual type Ψ A and the more restricted contextual type Φ <sup>v</sup> A which characterises only variables of type A from the domain-level context Φ. We give here two sample typing rules for Φ <sup>v</sup> A which are the ones used most in practice to illustrate the main idea. We embed contextual objects into computations via the modality. Computation-level types include boxed contextual types, -Φ A, and function types, written as (y : τ˘1) ⇒ τ2. We overload the function space and allow as domain of discourse both computation-level types and the schema ctx of domainlevel context, although only in the latter case y can occur in τ2. We use fn y ⇒ t to introduce functions of both kinds. We also overload function application t s to eliminate function types (y : τ1) ⇒ τ<sup>2</sup> and (y : ctx) ⇒ τ2, although in the latter case s stands for a domain-level context. We separate domain-level contexts from contextual objects, as we do not allow functions that return a domain-level context.

The recursor is written as rec<sup>I</sup> *B* Ψ t. Here, t describes a term of type -<sup>Ψ</sup> tm that we recurse over and *B* describes the different branches that we can take Γ; Ψ M : A Term M has type A in domain-level context Ψ and context Γ

Γ Ψ : ctx x:A ∈ Ψ Γ; Ψ x : A Γ Ψ : ctx Γ; Ψ lam : (tm → tm) → tm Γ Ψ : ctx Γ; Ψ app : tm → tm → tm Γ; Ψ M : A → B Γ; Ψ N : A Γ; Ψ M N : B Γ; Ψ, x:A M : B Γ; Ψ λx.M : A → B Γ t : Φ A Γ; Ψ σ : Φ Γ; Ψ t<sup>σ</sup> : A

Γ; Φ σ : Ψ Substitution σ provides a mapping from the (domain) context Ψ to Φ <sup>Γ</sup> Ψ, −−→x:<sup>A</sup> : ctx <sup>Γ</sup>; Ψ, −−→x:<sup>A</sup> wkΨ- : Ψ Γ Φ : ctx Γ; Φ · : · Γ; Φ σ : Ψ Γ; Φ M : A Γ; Φ σ, M : Ψ, x:A Γ Ψ : ctx Domain-level context Ψ is a well-formed Γ(y) = ctx Γ Ψ : ctx

 $\textbf{Fig. 2.Typing Rules}$  for Domain-level Terms, Substituting, Contents

Γ Ψ, x:A : ctx

Γ y : ctx

Γ · : ctx

depending on the value computed by t. As is common when we have dependencies, we annotate the recursor with the typing invariant I. Here, we consider only the recursor over domain-level terms of type tm. Hence, we annotate it with I = (ψ : ctx) ⇒ (y : <sup>ψ</sup> tm) <sup>⇒</sup> <sup>τ</sup> . To check that the recursor rec<sup>I</sup> <sup>B</sup> Ψ t has type [Ψ/ψ]τ , we check that each of the three branches has the specified type I. In the base case, we may assume in addition to ψ : ctx that we have a variable p : <sup>ψ</sup> <sup>v</sup> tm and check that the body has the appropriate type. If we encounter a contextual object built with the domain-level constant app, then we choose the branch bapp. We assume ψ: ctx, m: <sup>ψ</sup> tm, <sup>n</sup>: <sup>ψ</sup> tm, as well as <sup>f</sup><sup>n</sup> and f<sup>m</sup> which stand for the recursive calls on m and n respectively. We then check that the body tapp is well-typed. If we encounter a domain object built with the domain-level constant lam, then we choose the branch blam. We assume ψ: ctx and m: ψ, x:tm tm together with the recursive call <sup>f</sup><sup>m</sup> on <sup>m</sup> in the extended LF context ψ, x:tm. We then check that the body tlam is well-typed. The typing rules for computations are given in Fig. 3. We omit the reduction rules here and refer the interested reader to the appendix.

#### **5.1 Interpretation**

We now give an interpretation of simply-typed Cocon in a presheaf model with a cartesian closed universe of representables. Let us first extend the internal dependent type theory with the constant tm for modelling the domain-level type constant tm and with the constants app: El tm <sup>→</sup> El tm <sup>→</sup> El tm and

Γ C : T Contextual object C has contextual type T Γ; Ψ M : A Γ (Ψ- M):(Ψ A) Γ Ψ : ctx x:A ∈ Ψ Γ (Ψ- x):(Ψ <sup>v</sup> A) <sup>x</sup>:<sup>Φ</sup> <sup>v</sup> <sup>A</sup> ∈ Γ Γ; <sup>Ψ</sup> wkΨ- : Φ Γ (Ψ- xwkΨ- ):(Ψ <sup>v</sup> A) <sup>Γ</sup> <sup>t</sup> : <sup>τ</sup> Term <sup>t</sup> has computation type <sup>τ</sup> <sup>y</sup> : ˘<sup>τ</sup> <sup>∈</sup> <sup>Γ</sup> Γ y : ˘τ Γ C : T Γ C : T Γ t : (y : ˘τ1) ⇒ τ<sup>2</sup> Γ s : ˘τ<sup>1</sup> Γ t s : [s/y]τ<sup>2</sup> Γ, y : ˘τ<sup>1</sup> t : τ<sup>2</sup> Γ (y : ˘τ1) ⇒ τ<sup>2</sup> : type Γ fn y ⇒ t : (y : ˘τ1) ⇒ τ<sup>2</sup> Recursor over domain-level terms I = (ψ : ctx) ⇒ (y : ψ tm) ⇒ τ Γ t : Ψ tm Γ I : u Γ b<sup>v</sup> : I Γ bapp : I Γ blam : I Γ rec<sup>I</sup> (b<sup>v</sup> | bapp | blam) Ψ t : [Ψ/ψ]τ Branch for Variable Γ, ψ : ctx, p : <sup>ψ</sup> <sup>v</sup> tm <sup>t</sup><sup>v</sup> : <sup>τ</sup> Γ (ψ, p → tv) : I Branch for Application app Γ, ψ : ctx, m:<sup>ψ</sup> tm, n:<sup>ψ</sup> tm, fm:τ,fn:<sup>τ</sup> <sup>t</sup>app : <sup>τ</sup> Γ (ψ, m, n, fn, f<sup>m</sup> → tapp) : I Branch for Function lam Γ, φ : ctx, m:φ, x:tm tm, fm:[(φ, x:tm)/ψ]<sup>τ</sup> <sup>t</sup>lam : [φ/ψ]<sup>τ</sup> Γ ψ, m, f<sup>m</sup> → tlam : I

## **Fig. 3.** Typing Rules for Contextual Objects and Computations

lam: (El tm <sup>→</sup> El tm) <sup>→</sup> El tm to model the corresponding domain-level constants app and lam.

We can now translate domain-level and computation-level types of Cocon into the internal dependent type theory for <sup>D</sup>-. We do so by interpreting the domainlevel terms, types, substitutions, and contexts (see Fig. 4). All translations are on well-typed terms and types. Domain-level types are interpreted as the terms of type Obj in the internal dependent type theory that represent them. Domain-level contexts are also interpreted as terms of type Obj by -<sup>Γ</sup> <sup>Ψ</sup> : ctx. For example, a domain-level context x:tm, y:tm is interpreted as times (times unit tm) tm : Obj. A domain-level substitution with domain Ψ and codomain Φ becomes a term of type El e that is parameterised by an element u: El e, where e = -<sup>Γ</sup> <sup>Φ</sup> : ctx and <sup>e</sup>- <sup>=</sup> -<sup>Γ</sup> <sup>Ψ</sup> : ctx. As <sup>e</sup> is some product, for example times (times unit tm) tm, the domain-level substitution is translated into an n-ary tuple. A weakening substitution <sup>Γ</sup>; Ψ,x:tm wk<sup>Ψ</sup> : <sup>Ψ</sup> is interpreted as fst <sup>u</sup> where <sup>u</sup>: El (times <sup>e</sup> tm) and <sup>e</sup> <sup>=</sup> -<sup>Γ</sup> <sup>Ψ</sup> : ctx. More generally, when we weaken a context Ψ by n declarations, i.e. −−→ x:A, we interpret wk<sup>Ψ</sup> as fst<sup>n</sup> u.

A well-typed domain-level term, Γ; Ψ M : A, is mapped to an object of type El -<sup>A</sup> that depends on <sup>u</sup>:El -<sup>Γ</sup> <sup>Ψ</sup> : ctx.

Hence the translation of a well-typed domain-level term is indexed by u that stands for the term-level interpretation of a domain-level context Φ. Initially, u

Interpretation of domain-level types

tm <sup>=</sup> tm -<sup>A</sup> <sup>→</sup> <sup>B</sup> <sup>=</sup> arrow -<sup>A</sup> -B Interpretation of domain-level contexts -<sup>Γ</sup> <sup>ψ</sup> : ctx <sup>=</sup> <sup>ψ</sup> -<sup>Γ</sup> · : ctx <sup>=</sup> unit -<sup>Γ</sup> (Ψ, x:A) : ctx <sup>=</sup> times <sup>e</sup> -<sup>A</sup> where -<sup>Γ</sup> <sup>Ψ</sup> : ctx <sup>=</sup> <sup>e</sup> Interpretation of domain-level terms where <sup>u</sup>: El <sup>e</sup> and -<sup>Γ</sup> <sup>Ψ</sup> : ctx <sup>=</sup> <sup>e</sup> -<sup>Γ</sup>; <sup>Ψ</sup> <sup>x</sup>: <sup>A</sup><sup>u</sup> <sup>=</sup> snd (fst<sup>k</sup> <sup>u</sup>) where <sup>Ψ</sup> <sup>=</sup> <sup>Ψ</sup>0, x:A, yk:Ak,...,y1:A<sup>1</sup> -<sup>Γ</sup>; <sup>Ψ</sup> λx. M : <sup>A</sup> <sup>→</sup> <sup>B</sup><sup>u</sup> <sup>=</sup> arrow-i (<sup>x</sup>:El -<sup>A</sup>. e) where -<sup>Γ</sup>; Ψ, x:<sup>A</sup> <sup>M</sup> : <sup>B</sup>(pair u x) <sup>=</sup> <sup>e</sup> -<sup>Γ</sup>; <sup>Ψ</sup> M N : <sup>B</sup><sup>u</sup> <sup>=</sup> arrow-e <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> where -<sup>Γ</sup>; <sup>Ψ</sup> <sup>M</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup><sup>u</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> and -<sup>Γ</sup>; <sup>Ψ</sup> <sup>N</sup> : <sup>A</sup><sup>u</sup> <sup>=</sup> <sup>e</sup><sup>2</sup> -<sup>Γ</sup>; <sup>Ψ</sup> t<sup>σ</sup> : <sup>A</sup><sup>u</sup> <sup>=</sup> let box <sup>x</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> in x e<sup>2</sup> where -<sup>Γ</sup> <sup>t</sup> : <sup>Φ</sup> <sup>A</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> and -<sup>Γ</sup>; <sup>Ψ</sup> <sup>σ</sup> : <sup>Φ</sup><sup>u</sup> <sup>=</sup> <sup>e</sup><sup>2</sup> -<sup>Γ</sup>; <sup>Ψ</sup> app : tm <sup>→</sup> tm <sup>→</sup> tm<sup>u</sup> <sup>=</sup> arrow-i(x:El tm. arrow-i (y:El tm. app x y)) -<sup>Γ</sup>; <sup>Ψ</sup> lam : (tm <sup>→</sup> tm) <sup>→</sup> tm<sup>u</sup> <sup>=</sup> arrow-i(f:El (arrow tm tm). lam (x:El tm. arrow-e f x)) Interpretation of domain-level substitutions where <sup>u</sup>: El <sup>e</sup> and -<sup>Γ</sup> <sup>Φ</sup> : ctx <sup>=</sup> <sup>e</sup> -<sup>Γ</sup>; <sup>Ψ</sup> · : ·<sup>u</sup> <sup>=</sup> terminal -<sup>Γ</sup>; <sup>Ψ</sup> (σ, M) : Φ, x:A<sup>u</sup> <sup>=</sup> pair <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> where -<sup>Γ</sup>; <sup>Ψ</sup> <sup>σ</sup> : <sup>Φ</sup><sup>u</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> and -<sup>Γ</sup>; <sup>Ψ</sup> <sup>M</sup> : <sup>A</sup><sup>u</sup> <sup>=</sup> <sup>e</sup><sup>2</sup> -<sup>Γ</sup>; Ψ, −−→x:<sup>A</sup> wkΦ- : <sup>Φ</sup><sup>u</sup> <sup>=</sup> fst<sup>n</sup> <sup>u</sup> where <sup>n</sup> <sup>=</sup> <sup>|</sup> −−→x:A<sup>|</sup>

**Fig. 4.** Interpretation of Domain-level Types and Terms

is simply a variable. However, when we translate Γ;Φ λx.M : A → B given <sup>u</sup>: El <sup>e</sup> where -<sup>Γ</sup> <sup>Ψ</sup> : ctx <sup>=</sup> <sup>e</sup>, we need to recursively translate <sup>M</sup> in the extended domain-level context Ψ,x:A and hence we also need to build a term pair u x that inhabits El (times <sup>e</sup> -<sup>A</sup>). The translation of <sup>Γ</sup>;Φ, x:<sup>A</sup> <sup>M</sup> : <sup>A</sup> will return a term e that may contain x. However, note that x will eventually be bound in arrow-i (<sup>x</sup>:El -<sup>A</sup>. e) When we translate a variable <sup>x</sup> where <sup>Φ</sup> <sup>=</sup> <sup>Φ</sup>0, x:A, yk:Ak,...,y1:A1, we return fst<sup>k</sup> (snd <sup>u</sup>). We translate <sup>Γ</sup>;<sup>Φ</sup> <sup>t</sup> <sup>σ</sup> : <sup>A</sup> directly using let box-construct where the domain-level substitution σ is simply translated into a pair. As the computation t has the contextual type -<sup>Ψ</sup> tm its translation will be of type (El <sup>e</sup> <sup>→</sup> El tm) where <sup>e</sup> <sup>=</sup> -<sup>Γ</sup> <sup>Ψ</sup> : ctx. Hence we simply can extract a function <sup>x</sup>:(El <sup>e</sup> <sup>→</sup> El tm) using let box construct and pass to it the interpretation of σ. The translation of domain-level applications and domain-level constants app and lam is straightforward.

The interpretation of a contextual types -Ψ A makes explicit the fact that they correspond to functions El <sup>e</sup> <sup>→</sup> El -<sup>A</sup> where <sup>e</sup> <sup>=</sup> -<sup>Γ</sup> <sup>Ψ</sup> : ctx (see Fig. 5). Consequently, the corresponding contextual object (Φ-M) is interpreted as a

Interpretation of contextual objects (C) -Γ (Φ- <sup>M</sup>):(<sup>Φ</sup> <sup>A</sup>) <sup>=</sup> u: El e. e where -<sup>Γ</sup> <sup>Φ</sup> : ctx <sup>=</sup> <sup>e</sup> and -<sup>Γ</sup>; <sup>Φ</sup> <sup>M</sup> : <sup>A</sup><sup>u</sup> <sup>=</sup> <sup>e</sup>- -Γ (Φ- <sup>M</sup>):(<sup>Φ</sup> <sup>v</sup> <sup>A</sup>) <sup>=</sup> u: El e. e where -<sup>Γ</sup> <sup>Φ</sup> : ctx <sup>=</sup> <sup>e</sup> and -<sup>Γ</sup>; <sup>Φ</sup> <sup>M</sup> : <sup>A</sup><sup>u</sup> <sup>=</sup> <sup>e</sup>- Interpretation of contextual types (T)


Interpretation of computation-level types (˘τ ) -T <sup>=</sup> --T -(x:˘τ1) <sup>⇒</sup> <sup>τ</sup><sup>2</sup> = (x:<sup>τ</sup>˘<sup>1</sup>) <sup>→</sup> τ2 ctx <sup>=</sup> Obj Computation-level typing contexts (Γ) -· <sup>=</sup> · -Γ, x: ˘<sup>τ</sup> <sup>=</sup> -<sup>Γ</sup>, x: τ˘ Interpretation of computations (Γ t : τ ; without recursor) -<sup>Γ</sup> C : T <sup>=</sup> box <sup>e</sup> where -<sup>Γ</sup> <sup>C</sup> : <sup>T</sup> <sup>=</sup> <sup>e</sup> -<sup>Γ</sup> <sup>t</sup><sup>1</sup> <sup>t</sup><sup>2</sup> : <sup>τ</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> where -<sup>Γ</sup> <sup>t</sup><sup>1</sup> : (x:˘τ2) <sup>⇒</sup> <sup>τ</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> and -<sup>Γ</sup> <sup>t</sup><sup>2</sup> : ˘τ<sup>2</sup> <sup>=</sup> <sup>e</sup><sup>2</sup> -<sup>Γ</sup> fn <sup>x</sup> <sup>⇒</sup> <sup>t</sup> : (x:˘τ1) <sup>⇒</sup> <sup>τ</sup><sup>2</sup> <sup>=</sup> x: <sup>τ</sup>˘<sup>1</sup>. e where -Γ, x:˘τ<sup>1</sup> <sup>t</sup> : <sup>τ</sup><sup>2</sup> <sup>=</sup> <sup>e</sup> -<sup>Γ</sup> <sup>x</sup> : <sup>τ</sup> <sup>=</sup> <sup>x</sup>

**Fig. 6.** Interpretation of Computation-level Types and Terms – without recursor

function. Similarly, -Ψ <sup>v</sup> A is mapped to the restricted function space denoted by →v, which describes functions with bodies that only contain projections.

Last, we give the interpretation of computation-level types, contexts and terms (see Fig. 6). It is mostly straightforward, as we simply map -<sup>T</sup> to -T and -C is simply interpreted as boxed term.

The interpretation of the recursor is also straightforward now (see Fig. 7). In Lemma 4, we expressed a primitive recursion scheme in our internal type theory and defined a term rec together with its type. We now interpret every branch of our recursor in the computation-level as a function of the required type in our internal type theory. While this is somewhat tedious, it is straightforward.

We can now show that all well-typed domain-level and computation-level objects are translated into well-typed constructions in our internal type theory. As a consequence, we can show that equality in Cocon is equivalent to the corresponding equivalence in our internal type theoretic interpretation.

Interpretation of recursor for I = (ψ : ctx) ⇒ (y : ψ tm) ⇒ τ : -<sup>Γ</sup> rec<sup>I</sup> (b<sup>v</sup> <sup>|</sup> <sup>b</sup>app <sup>|</sup> <sup>b</sup>lam) Ψ t : [Ψ/ψ, t/y]<sup>τ</sup> <sup>=</sup> rec <sup>e</sup><sup>v</sup> <sup>e</sup>app <sup>e</sup>lam <sup>e</sup><sup>c</sup> <sup>e</sup> where -<sup>Γ</sup> <sup>b</sup><sup>v</sup> : <sup>I</sup> <sup>=</sup> <sup>e</sup>v, -<sup>Γ</sup> <sup>b</sup>app : <sup>I</sup> <sup>=</sup> <sup>e</sup>app, -<sup>Γ</sup> <sup>b</sup>lam : <sup>I</sup> <sup>=</sup> <sup>e</sup>lam, -<sup>Γ</sup> <sup>Ψ</sup> : ctx <sup>=</sup> <sup>e</sup><sup>c</sup> and -<sup>Γ</sup> <sup>t</sup> : <sup>Ψ</sup> tm <sup>=</sup> <sup>e</sup> Interpretation of Variable Branch -<sup>Γ</sup> (ψ, x <sup>→</sup> <sup>t</sup>v) : <sup>I</sup> <sup>=</sup> ψ: Obj. x:-(El ψ →<sup>v</sup> El tm). e where -Γ, ψ : ctx, x : <sup>ψ</sup> <sup>v</sup> tm <sup>t</sup><sup>v</sup> : [x/y]<sup>τ</sup> <sup>=</sup> <sup>e</sup> Interpretation of Application Branch -<sup>Γ</sup> (ψ, m, n, fn, f<sup>m</sup> <sup>→</sup> <sup>t</sup>app) : <sup>I</sup> <sup>=</sup> ψ: Obj. m, n:-(El ψ → El tm). <sup>f</sup>m: -[m/y]<sup>τ</sup> . <sup>f</sup>n: -[n/y]<sup>τ</sup> . e where -Γ, ψ:ctx, m:<sup>ψ</sup> tm, n:<sup>ψ</sup> tm <sup>t</sup>app : [<sup>ψ</sup> app m n/y]<sup>τ</sup> <sup>=</sup> <sup>e</sup> Interpretation of Lambda-Abstraction Branch -<sup>Γ</sup> (ψ, m, f<sup>m</sup> <sup>→</sup> <sup>t</sup>lam) : <sup>I</sup> <sup>=</sup> ψ: Obj. m:-(El (times ψ tm) → El tm). fm:τm.e

where -[(ψ, x:tm)/ψ, m/y]<sup>τ</sup> <sup>=</sup> <sup>τ</sup>m, -Γ, ψ:ctx, m:ψ, x:tm tm <sup>t</sup>app : [<sup>ψ</sup> lam λx.m/y]<sup>τ</sup> <sup>=</sup> <sup>e</sup>

**Fig. 7.** Interpretation of Recursor

**Lemma 5.** *The interpretation maintains the following typing invariants:*

**–** *If* <sup>Γ</sup> <sup>Ψ</sup> : ctx *then* -<sup>Γ</sup> <sup>Ψ</sup> : ctx : Obj*.* **–** *If* <sup>Γ</sup>; <sup>Ψ</sup> <sup>M</sup> : <sup>A</sup> *then* -<sup>Γ</sup>, u: El -<sup>Γ</sup> <sup>Ψ</sup> : ctx -<sup>Γ</sup>; <sup>Ψ</sup> <sup>M</sup> : <sup>A</sup><sup>u</sup> : El -A*.* **–** *If* <sup>Γ</sup>; <sup>Ψ</sup> <sup>σ</sup> : <sup>Ψ</sup> *then* -<sup>Γ</sup>, u: El -<sup>Γ</sup> <sup>Ψ</sup> : ctx -<sup>Γ</sup>; <sup>Ψ</sup> <sup>σ</sup> : <sup>Ψ</sup><sup>u</sup> : El -Ψ*.* **–** *If* <sup>Γ</sup> <sup>C</sup> : <sup>T</sup> *then* -<sup>Γ</sup> -<sup>Γ</sup> <sup>C</sup> : <sup>T</sup> : -T*.* **–** *If* <sup>Γ</sup> <sup>t</sup>: <sup>τ</sup> *then* -<sup>Γ</sup> -<sup>Γ</sup> <sup>t</sup>: <sup>τ</sup> : τ *.*

The proof goes by induction on derivations.

**Proposition 1 (Soundness).** *The following are true.*

**–** *If* Γ; Ψ M ≡ N : A *then* -<sup>Γ</sup>, u: El -<sup>Ψ</sup> -<sup>Γ</sup>; <sup>Ψ</sup> <sup>M</sup> : <sup>A</sup><sup>u</sup> <sup>=</sup> -<sup>Γ</sup>; <sup>Ψ</sup> <sup>N</sup> : <sup>A</sup><sup>u</sup> : El -A*.* **–** *If* Γ; Ψ σ ≡ σ- : Φ *then* -<sup>Γ</sup>, u: El -<sup>Ψ</sup> -<sup>Γ</sup>; <sup>Ψ</sup> <sup>σ</sup> : <sup>Φ</sup><sup>u</sup> <sup>=</sup> -Γ; Ψ σ- : <sup>Φ</sup><sup>u</sup> : El -Φ*.* **–** *If* <sup>Γ</sup> <sup>t</sup><sup>1</sup> <sup>≡</sup> <sup>t</sup><sup>2</sup> : <sup>τ</sup> *then* -<sup>Γ</sup> -<sup>Γ</sup> <sup>t</sup><sup>1</sup> : <sup>τ</sup> <sup>=</sup> -<sup>Γ</sup> <sup>t</sup><sup>2</sup> : <sup>τ</sup> : τ *.*

## **6 Presheaves on a Small Category with Attributes**

To explain the core of our approach as simply as possible, we have concentrated on a simply-typed domain language. In the remaining space, we outline how our approach generalises to dependent domain languages like LF.

We follow the same approach as above. We start from a term model D of the domain language and then interpret contextual types in the presheaf category <sup>D</sup>-. In the simply-typed case above, D was a small cartesian closed category. In the

dependent case, D is a small *Category with Attributes*. Categories with attributes (CwAs) [11] are a general notion of model for dependent type theories that is suitable for modelling dependent domain languages like LF.

With this change, we follow essentially the same approach as above. The main difference is that the universe of representables now makes available the CwA-structure of D instead of the cartesian closed structure. The following section outlines this in analogy to Sec. 4.1.

## **6.1 Yoneda CwA**

In a Yoneda CwA we again have a type for the objects of D, which we now denote Ctx. In the term model for LF, these would be the LF contexts. The type Ty c represents (possibly dependent) LF types in context c. Contexts can be built with the constants nil and cons.

> Ctx type nil: Ctx <sup>c</sup>: Ctx Ty <sup>c</sup> type cons: (c: Ctx) <sup>→</sup> (a: Ty <sup>c</sup>) <sup>→</sup> Ctx

Both Ctx and Ty c are constant presheaves, i.e. Ctx = Ctx and (Ty c) = Ty c. As in Sec. 4.1, we consider the contexts as codes of a universe.

$$c \colon \mathsf{Ctx} \vdash \mathsf{El } c \text{ type}$$

The type El c has the same interpretation as before and is essentially just the Yoneda embedding. The morphisms <sup>c</sup> <sup>→</sup> <sup>d</sup> of the CwA <sup>D</sup> thus appear as functions of type El <sup>c</sup> <sup>→</sup> El <sup>d</sup>.

The axioms of a CwA can be stated using terms and equations in the internal language of <sup>D</sup>-. For example, substitution on types and context projection morphisms are given by the following constants.

$$\begin{array}{c} c, d \colon \mathsf{Ctx} \vdash \mathsf{sub} \colon (a \colon \mathsf{Ty} \; d) \to (f \colon \mathsf{E1} \; c \to \mathsf{E1} \; d) \to \mathsf{Ty} \; c \\ c \colon \mathsf{Ctx}, a \colon \mathsf{Ty} \; c \vdash p \colon \mathsf{E1} \; (\mathsf{cons} \; c \; a) \to \mathsf{E1} \; c \end{array}$$

The other components of a CwA are added similarly and the CwA-axioms [11] are expressed in terms of equations for these constants.

The inhabitants of a type can then be captured by the dependent type

$$c 
\text{:} \mathsf{Ctx}, \, a 
\colon \mathsf{Ty} \, c, \, u 
\colon \mathsf{E1} \, c 
\vdash \mathsf{I} \, a 
\, u 
\text{ type}$$

defined by I a u := Σv: El (cons c a).(p v) = u. This type contains all values in El (cons c a) whose first projection is u. If one considers u: El c as a dependent tuple of LF terms (one term for each variable in the context represented by c), then I a u represents all the terms that can be appended to this tuple to make it into one of type El (cons c a). Indeed, one can define a pairing operation by pair := λu. λv, p . v.

$$a: \mathsf{Ctx}, \ a: (\mathsf{Ty}\ c) \vdash \mathsf{pair}: (u: \mathsf{E1}\ c) \to \mathsf{I}\ a\ u \to \mathsf{E1}\ (\mathsf{cons}\ c\ a):$$

With these definitions, we can represent dependent contextual types much like the simply-typed ones. Recall that we had interpreted <sup>Φ</sup> <sup>A</sup> by El -<sup>Φ</sup> <sup>→</sup> El -A where both -<sup>Φ</sup> and -<sup>A</sup> were terms of type Obj. In the dependent case, <sup>A</sup> may depend on <sup>Φ</sup>. The interpretation of <sup>Φ</sup> is a term -<sup>Φ</sup> : Ctx, much as before. The interpretation of <sup>A</sup> takes the dependency into account: <sup>u</sup>: El -<sup>Φ</sup> -<sup>A</sup><sup>u</sup> : Ty <sup>u</sup>. The interpretation of the contextual type Φ A will then be:

$$(u \colon \mathbf{E1} \left[ \Phi \right]) \to \mathbf{I} \left[ A \right]\_u u$$

It may be interesting to note that (u: El <sup>c</sup>) <sup>→</sup> <sup>I</sup> a u is isomorphic to the type of sections of <sup>p</sup>: El (cons c a) <sup>→</sup> El <sup>c</sup>.

Object-level term constants in LF can be lifted using I. Consider, for example, an encoding of the simply-typed lambda-calculus in LF. It represents only welltyped terms by means of the constants app: Πa, b: ty. tm (arr a b) <sup>→</sup> tm <sup>a</sup> <sup>→</sup> tm <sup>b</sup> and lam: Πa, b: ty. (tm <sup>a</sup> <sup>→</sup> tm <sup>b</sup>) <sup>→</sup> tm (arr a b). Therein, the type tm of objectlevel terms is dependent on an object-level type ty, which may be built using a constant <sup>o</sup>: ty for a base type and a constant arr: ty <sup>→</sup> ty <sup>→</sup> ty for function types. This encoding lifts to the Yoneda CwA as in simply-typed case:

<sup>c</sup>: Ctx ty: Ty c Γ <sup>o</sup>: I ty <sup>u</sup> <sup>c</sup>: Ctx tm: Ty (cons <sup>c</sup> ty) <sup>Γ</sup> arr: I ty <sup>u</sup> <sup>→</sup> I ty <sup>u</sup> <sup>→</sup> I ty <sup>u</sup> <sup>Δ</sup> app: I tm (pair <sup>u</sup> (arr a b)) <sup>→</sup> I tm (pair u a) <sup>→</sup> I tm (pair u b) lam: (I tm (pair u a) <sup>→</sup> I tm (pair u b)) <sup>→</sup> I tm (pair <sup>u</sup> (arr a b))

Here, Γ abbreviates c: Ctx, u:(El c) and Δ abbreviates Γ, a, b:(I ty u). Notice how lam uses higher-order abstract syntax at the meta level.

With these definitions, the interpretation of Cocon is essentially just as before. For working with the dependencies in a Yoneda CwA, we found it very useful to type-check our definitions in Agda, see our sources [1].

## **7 Conclusion**

We have given a rational reconstruction of contextual type theory in presheaf models of higher-order abstract syntax. This provides a semantical way of understanding the invariants of contextual types independently of the algorithmic details of type checking. At the same time, we identify the contextual modal type theory, Cocon, which is known to be normalising, as a syntax for presheaf models of HOAS. By accounting for the Yoneda embedding with a universe ´a la Tarski, we obtain a manageable way of constructing contextual types in the model, especially in the dependent case. While various forms of universes are being studied in the context of functor categories, e.g. [2,16], we are not aware of previous uses of presheaves over CwAs or similar.

In future work, one may consider using the model as a way of compiling contextual types, by implementing the semantics. In another direction, it may be interesting to apply the syntax of contextual types to other presheaf categories. We also hope that the model will help to guide the further development of Cocon. *Acknowledgements.* We thank the anonymous reviewers for helpful feedback.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Ambiguity, Weakness, and Regularity in Probabilistic Büchi Automata

Christof Löding and Anton Pirogov(-) -

RWTH Aachen University, Templergraben 55, 52062 Aachen, Germany {loeding,pirogov}@cs.rwth-aachen.de

Abstract. Probabilistic Büchi automata are a natural generalization of PFA to infinite words, but have been studied in-depth only rather recently and many interesting questions are still open. PBA are known to accept, in general, a class of languages that goes beyond the regular languages. In this work we extend the known classes of restricted PBA which are still regular, strongly relying on notions concerning ambiguity in classical ω-automata. Furthermore, we investigate the expressivity of the not yet considered but natural class of weak PBA, and we also show that the regularity problem for weak PBA is undecidable.

Keywords: probabilistic · Büchi · automata · ambiguity · weak

## 1 Introduction

Probabilistic finite automata (PFA) are defined similarly to nondeterministic finite automata (NFA) with the difference that each transition is equipped with a probability (a value between 0 and 1), such that for each pair of state and letter, the probabilities of the corresponding outgoing transitions sum up to 1. PFA have been investigated already in the 1960ies in the seminal paper of Rabin [18]. But while the development of the theory of automata on infinite words also started around the same time [7], the model of probabilistic automata on infinite words has first been studied systematically in [3]. The central model in this theory is the one of probabilistic Büchi automata (PBA), which are syntactically the same as PFA. The acceptance condition for runs is defined as for standard nondeterministic Büchi automata (NBA): a run on an infinite word is accepting if it visits an accepting state infinitely often (see [23,24] for an introduction to the theory of automata on infinite words). In general, for probabilistic automata one distinguishes different criteria of when a word is accepted. In the positive semantics, it is required that the probability of the set of accepting runs is greater than 0, in the almost-sure semantics it has to be 1, and in the threshold semantics it has to be greater than a given value λ between 0 and 1. It is easy to see that PFA with positive or almost-sure semantics can only accept regular languages, because these conditions correspond to the fact that there is an accepting run or

<sup>-</sup> This work is supported by the German research council (DFG) Research Training Group 2236 UnRAVeL

c The Author(s) 2020

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 522–541, 2020. https://doi.org/10.1007/978-3-030-45231-5\_27

that all runs are accepting. For infinite words the situation is different, because single runs on infinite words can have probability 0. Therefore, the existence of an accepting run is not the same as the set of accepting runs having probability greater than 0 (similarly, almost-sure semantics is not equivalent to all runs being accepting). And in fact, it turns out that PBA with positive (or almostsure) semantics can accept non-regular languages [3]. This naturally raises the question under which conditions a PBA accepts a regular language.

In [3] a subclass of PBA that accept only regular languages (under positive semantics) is introduced, called uniform PBA. The definition uses a semantic condition on the acceptance probabilities in end components of the PBA. A syntactic class of PBA that accepts only regular languages (under positive and almost-sure semantics) are the hierarchical PBA (HPBA) introduced in [8]. The state space of HPBA is partitioned into a sequence of layers such that for each pair of state and letter there is at most one transition that does not increase the layer. Decidability and expressiveness questions for HPBA have been studied in more detail in [11,10]. While HPBA accept only regular languages for positive and almost-sure semantics, it is not very hard to come up with HPBA that accept non-regular languages under the threshold semantics [8,11] (see also the example in Figure 2(a) on page 10). Restricting HPBA further such that there are only two layers and all accepting states are on the first layer leads to a class of PBA (called simple PBA, SPBA) that accept only regular languages even under threshold semantics [9].

In this paper, we are also interested in the question under which conditions PBA accept only regular languages. We identify syntactical patterns in the transition structure of PBA whose absence guarantees regularity of the accepted language. These patterns have been used before for the classification of the degree of ambiguity of NFA and NBA [25,19,16]. The degree of ambiguity of a nondeterministic automaton corresponds to the maximal number of accepting runs that a single input word can have. For NBA, the ambiguity can (roughly) be uncountable, countable, or finite. For positive semantics, we show that PBA whose transition structure corresponds to at most countably ambiguous NBA, accept only regular languages. For almost-sure semantics, we need a slightly stronger condition for ensuring regularity. But both classes that we identify are easily seen to strictly subsume the class of HPBA. For the emptiness and universality problems for these classes we obtain the same complexities as the ones for HPBA. In the case of threshold semantics, we show that finite ambiguity is a sufficient condition for regularity of the accepted language, generalizing a corresponding result for PFA from [12]. The class of finitely ambiguous PBA strictly subsumes the class of SPBA.

Besides the relation between regularity and ambiguity in PBA, we also investigate the class of weak PBA (abbreviated PWA). In weak Büchi automata, the set of accepting states is a union of strongly connected components of the automaton. We show that PWA with almost-sure semantics define the same class of languages as PBA with almost-sure semantics (which implies that with positive semantics PWA define the same class as probabilistic co-Büchi automata). This is in correspondence to results for non-probabilistic automata: weak automata with universal semantics (a word is accepted if all runs are accepting) define the same class as Büchi automata with universal semantics, and nondeterministic weak automata correspond to nondeterministic co-Büchi automata (see, e.g., [17], where weak automata are called weak parity automata). Furthermore, it is known that universal Büchi automata, respectively nondeterministic co-Büchi automata, can be transformed into equivalent deterministic automata (with the same acceptance condition). An analogue of deterministic automata in the probabilistic setting are the so-called 0/1 automata, in which each word is either accepted with probability 0 or with probability 1. It is known that almost-sure PBA can be transformed into equivalent 0/1 PBA (see the proof of Theorem 4.13 in [4]). Concerning weak automata, a language can be accepted by a deterministic weak automaton (DWA) if, and only if, it can be accepted by a deterministic Büchi and by a deterministic co-Büchi automaton (this follows from results in [14], see [6] for a more direct construction). We show an analogous result in the probabilistic setting: The class of languages defined by 0/1 PWA corresponds to the intersection of the two classes defined by PWA with almostsure semantics and with positive semantics, respectively. It turns out that this class contains only regular languages, that is, 0/1 PWA define the same class as DWA.

We also show that the regularity problem for PBA is undecidable (the problem of deciding for a given PBA whether its language is regular). For PBA with positive semantics this is not surprising, as for those already the emptiness problem is undecidable [4]. However, for PBA with almost-sure semantics the emptiness and universality problems are decidable [1,2,8]. We show that regularity is undecidable already for PWA with almost-sure or with positive semantics. The proof also yields that it is undecidable for a fixed regular language whether a given PWA accepts this language.

This work is organized as follows. After introducing basic notations in Section 2 we first characterize various regular subclasses of PBA that we derive from ambiguity patterns in Section 3 and then we derive some related complexity results in Section 4. In Section 5 we present our results concerning weak probabilistic automata and in Section 6 we conclude.

## 2 Preliminaries

First we briefly review some basic definitions.

If Σ is a finite alphabet, then Σ<sup>∗</sup> is the set of all finite and Σ<sup>ω</sup> is the set of all infinite words w = w0w<sup>1</sup> ... with w<sup>i</sup> ∈ Σ. For a word w we denote by w(i) the i-th symbol wi.

Classical automata used in this work have usually the shape (Q, Σ, Δ, Q0, F), where Q is a finite set of states, Σ a finite alphabet, Δ ⊆ Q× Σ × Q is the transition relation and Q0, F ⊆ Q are the sets of initial and final states, respectively.

We write Δ(p, a) := {q ∈ Q | (p, a, q) ∈ Δ} to denote the set of successors of p ∈ Q on symbol a ∈ Σ, and Δ(P, w) for P ⊆ Q, w ∈ Σ<sup>∗</sup> with the usual meaning, i.e., states reachable on word w from any state in P.

<sup>A</sup> run of an automaton on a word <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> is an infinite sequence of states q0, q1,... starting in some q<sup>0</sup> ∈ Q<sup>0</sup> such that (qi, w(i), qi+1) ∈ Δ for all i ≥ 0. We say that a set of runs is separated (at time i) when the prefixes of length i of those runs are pairwise different.

As usual, an automaton is deterministic if |Q0| = 1 and |Δ(p, a)| ≤ 1 for all p ∈ Q, a ∈ Σ, and nondeterministic otherwise. For deterministic automata we may use a transition function δ : Q × Σ → Q instead of a relation.

Probabilistic automata we consider have the shape (Q, Σ, δ, μ0, F), i.e., the transition relation is replaced by a function δ : Q × Σ × Q → [0, 1] which for each state and symbol assigns a probability distribution on successor states (i.e. - <sup>q</sup>∈<sup>Q</sup> <sup>δ</sup>(p, a, q)=1 for all <sup>p</sup> <sup>∈</sup> Q, a <sup>∈</sup> <sup>Σ</sup>), and <sup>μ</sup><sup>0</sup> : <sup>Q</sup> <sup>→</sup> [0, 1] with - <sup>q</sup>∈<sup>Q</sup> <sup>μ</sup>0(q) = 1 is the initial probability distribution on states. The support of a distribution μ is the set supp(μ) := {x | μ(x) > 0}. Similarly as above, we may write δ(μ, w) and mean the resulting probability distribution after reading w ∈ Σ∗, when starting with probability distribution μ.

For a probabilistic automaton <sup>A</sup> the underlying automaton <sup>A</sup> is given by recovering the transition relation Δ := {(p, x, q) | δ(p, x, q) > 0} of positively reachable states and the initial state set Q<sup>0</sup> := supp(μ0).

As usual, a run of an automaton for finite words is accepting if it ends in a final state. For automata on infinite words, run acceptance is determined by the Büchi (run visits infinitely many final states) or Co-Büchi (run visits finitely many final states) conditions.

We write <sup>p</sup> <sup>x</sup> <sup>→</sup> <sup>q</sup> if there exists a path from <sup>p</sup> to <sup>q</sup> labelled by <sup>x</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup> and <sup>p</sup> <sup>→</sup> <sup>q</sup> if there exists some <sup>x</sup> such that <sup>p</sup> <sup>x</sup> → q. The strongly connected component (SCC) of p ∈ Q is scc(p) := {q ∈ Q | p = q or p → q and q → p}. The set SCCs(A) := {scc(q) | q ∈ Q} is the set of all SCCs and partitions Q. An SCC is accepting (rejecting) if all (no) runs that stay there forever are accepting. An SCC is useless if no accepting run can continue from there. An automaton is weak, if the set of final states is a union of its SCCs. In this case, Büchi and Co-Büchi acceptance are equivalent and we treat weak automata as Büchi automata.

A classical automaton is trim if it has no useless SCCs, whereas a probabilistic automaton is trim if it has at most one useless SCC, which is a rejecting sink that we canonically call qrej . We assume w.l.o.g. that all considered automata are trim, which also means that in an underlying automaton the sink qrej is removed.

We call transitions of probabilistic automata that have probability 1 deterministic and otherwise branching. If there are transitions <sup>p</sup> <sup>a</sup> <sup>→</sup> <sup>q</sup> and <sup>p</sup> <sup>a</sup> → q with q = q , we call this pattern a fork. Every branching transition clearly has at least one fork. We call a (p, q, q ) fork intra-SCC, if p, q, q are all in the same SCC, otherwise it is an inter-SCC fork. A run of an automaton is deterministic if it never goes through forks, and limit-deterministic if it goes only through finitely many forks. We say that two deterministic runs merge when they reach the same state simultaneously. For a finite run prefix ρ, we call all valid runs with this prefix continuations of ρ.

A classical automaton <sup>A</sup> accepts <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> if there exists an accepting run on w, and the language L(A) recognized by A is the set of all accepted words. If P is a set of states of an automaton, we write L(P) for the language accepted by this automaton with initial state set P. For sets consisting of one state q, we write L(q) instead of L({q}).

For a probabilistic automaton A and an input word w (finite or infinite), the transition structure of A induces a probability space on the set of runs of A on w in the usual way. We do not provide the details here but rather refer the reader not familiar with these concepts to [4]. In general, we write Pr(E) for the probability of a measurable event E in a probability space. For probabilistic automata, we consider positive, almost-sure and threshold semantics, i.e., an automaton accepts w if the probability of the set of accepting runs on w is > 0, =1 or >λ (for some fixed λ ∈]0, 1[), respectively. For an automaton A these languages are denoted by <sup>L</sup><sup>&</sup>gt;0(A), L=1(A) and <sup>L</sup>>λ(A), respectively, whereas <sup>L</sup>(A) := <sup>L</sup>(A-) is the language of the underlying automaton. A probabilistic automaton is 0/1 if all words are accepted with either probability 0 or 1 (in this case the languages with the different probabilistic semantics coincide).

To denote the type of an automaton, we use abbreviations of the form XYA(γ) where the type of transition structure is denoted by X ∈ { D (det.), N (nondet.), P (prob.) }, the acceptance condition is specified by Y ∈ { F (finite word), B (Büchi), C (Co-Büchi), W (Weak) }, and for probabilistic transitions the semantics for acceptance is given by γ ∈ {>0,=1,>λ, 0/1}.

By L(γ) (XYA) we denote the whole class of languages accepted by the corresponding type of automaton. If L is a set of languages, then L denotes the set of all complement languages (similarly, for a language L, we denote by L its complement), and BCl(L) the set of all finite boolean combinations of languages in L. We use the notion of regular language for finite words and for infinite words (the type of words is always clear from the context).

## 3 Ambiguity of PBA

Ambiguity of automata refers to the number of different accepting runs on a word or on all words. An automaton is finitely ambiguous (on w) if there are at most <sup>k</sup> different accepting runs (on <sup>w</sup>) for some fixed <sup>k</sup> <sup>∈</sup> <sup>N</sup>, and in case of at most one accepting run it is called unambiguous. If on each word there are only finitely many accepting runs, but no constant upper bound over all words, then it is polynomially ambiguous if the number of different run prefixes that are possible for any word prefix of length n can be bounded by a polynomial in n, and otherwise exponentially ambiguous. Finally, if if there exist words that have infinitely many runs, but no word on which there are uncountably many accepting runs, then it is countably ambiguous, and otherwise it is uncountably ambiguous.

In [16] (see also [19]), a syntactic characterization of those classes is presented for NBA by simple patterns of states and transitions. We define those patterns here and refer to [16] for further details. An automaton A has an IDA pattern if there exist two states <sup>p</sup> <sup>=</sup> <sup>q</sup> and a word <sup>v</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> such that <sup>p</sup> <sup>v</sup> <sup>→</sup> <sup>p</sup>, <sup>p</sup> <sup>v</sup> → q and <sup>q</sup> <sup>v</sup> → q. If additionally q ∈ F, then this is also an IDA<sup>F</sup> pattern. Finally, A has an EDA pattern if there exists a state p and v ∈ Σ<sup>∗</sup> such that there are two different paths <sup>p</sup> <sup>v</sup> → p, and if additionally p ∈ F, this is also an EDA<sup>F</sup> pattern. If a PBA has no EDA pattern, we call it flat, reflecting the naming of a similar concept in other kinds of transition systems (e.g. [15]). The names IDA and EDA abbreviate "infinite/exponential degree of ambiguity", which they indicated in the original NFA setting, and we keep those names for consistency.

By <sup>k</sup>-NBA, <sup>n</sup><sup>k</sup>-NBA, <sup>2</sup><sup>n</sup>-NBA, <sup>ℵ</sup>0-NBA we denote the subsets of at most finitely, polynomially, exponentially and countably ambiguous NBA (and similarly for other types of automata). When speaking about ambiguity of some PBA <sup>A</sup>, we mean the ambiguity of the trimmed underlying NBA <sup>A</sup>-.

In [8], hierarchical PBA (HPBA) were identified as a syntactic restriction on PBA which ensures regularity under positive and almost-sure semantics. A PBA with a unique initial state is hierarchical, if it admits a ranking on the states such that at most one successor on a symbol has the same rank, and no successor has a smaller rank. A HPBA has k levels if it can be ranked with only k different values. Simple PBA (SPBA) were introduced in [9] and are restricted HPBA with two levels such that all accepting states are on level 0.

Fig. 1: Illustration of the automata classes with restricted ambiguity as presented for NBA in [16], which are characterized by the absence of the state patterns IDA,IDA<sup>F</sup> ,EDA, and EDA<sup>F</sup> and their relation to the restricted classes called "Hierarchical PBA" (HPBA) [8] and "Simple PBA" (SPBA) [9]. We identify classes in this hierarchy which can be seen as extensions "in spirit" of respectively SPBA and HPBA, subsuming them while also preserving their good properties, as e.g. definition by syntactic means, regularity under different semantics and several complexity results.

First, we show how HPBA relate to the ambiguity hierarchy, which can easily be derived by inspection of the definitions. A visual illustration is given in Figure 1.

## Proposition 1 (Relation of HPBA and the ambiguity hierarchy).


Starting from these observations, this work was motivated by the question whether the ambiguity restrictions, which were only implicit in HPBA and SPBA, can be used explicitly to get larger classes with good properties. In the following we will positively answer this question.

### 3.1 From classical to probabilistic automata

First, we observe that probabilistic automata can recognize regular languages even under severe ambiguity restrictions.

Proposition 2. Let A be a DBA. Then there exists an unambiguous PBA B such that <sup>L</sup><sup>&</sup>gt;0(B) = <sup>L</sup>=1(B) = <sup>L</sup>(A).

Proof. As A is a (w.l.o.g. complete) DBA, there exists exactly one run on each word and all transitions when seen as PBA must have probability 1. Clearly this unique natural 0/1 PBA obtained from A accepts the same language under both probable and almost-sure semantics and it is trivially unambiguous. 

Limit-deterministic NBA (LDBA) are NBA which are deterministic in all non-rejecting SCCs. The natural mapping of LDBA into PBA [4, Lemma 4.2] already trivially yields countably ambiguous automata (because the deterministic part of the LDBA cannot contain an EDA<sup>F</sup> pattern, which implies uncountable ambiguity [16]). The following result shows that already unambiguous PBA under positive semantics suffice for all regular languages.

Theorem 1. Let <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>ω</sup> be a regular language. Then there exists an unambiguous PBA <sup>B</sup> such that <sup>L</sup><sup>&</sup>gt;<sup>0</sup>(B) = <sup>L</sup>.

Proof (sketch). Let A = (Q, Σ, δ, q0, c) be a deterministic parity automaton accepting L, i.e., a finite automaton with priority function c : Q → {1,...,m} such that w ∈ L(A) iff the smallest priority assigned to a state on the unique run of A on w which is seen infinitely often is even.

We construct an unambiguous LDBA for L, which then easily yields a PBA<sup>&</sup>gt;<sup>0</sup> by assigning arbitrary probabilities ([4, Lemma 4.2]) without influencing the ambiguity. If the parity automaton A has m priorities, the LDBA B can be obtained by taking m+1 copies, where m of them are responsible for one priority each, and one is modified to guess which priority i on the input word is the most important one appearing infinitely often along the run of A, and correspondingly switch into the correct copy. This switching is done unambiguously for the first position after which no priority more important than i appears. 

#### 3.2 From probabilistic to classical automata

First we establish a result for flat PBA, i.e. PBA that have no EDA pattern. In automata without EDA pattern there are no states which are part of two different cycles labeled by the same finite word. Even though we defined flat PBA by using an ambiguity pattern, the set of flat PBA does not correspond to an ambiguity class, but it is useful for our purposes due to the following property:

Lemma 1. If <sup>A</sup> is a flat PBA and <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>, then the probability of a run of <sup>A</sup> on w to be limit-deterministic is 1.

Proof. Let Runs(A, w) denote the set of all runs of A on w and nldRuns(A, w) denote the subset containing all such runs that are not limit-deterministic. As A is flat, it has no EDA and thus also no EDA<sup>F</sup> pattern, hence A is at most countably ambiguous (by [16]). Moreover, there are not only at most countably many accepting runs on any word, but also countably many rejecting runs (which can be seen by a simple generalization of [16, Lemma 4]). But as all runs are disjoint events, each run ρ that uses infinitely many forks has probability 0, and the total number of runs is countable, we can see that

$$\Pr(\mathsf{Runs}(\mathcal{A}, w) \mid \mathsf{nldRuns}(\mathcal{A}, w)) = \sum\_{\rho \in \mathsf{Runs}(\mathcal{A}, w)} \Pr(\rho) \quad - \sum\_{\rho \in \mathsf{nldRuns}(\mathcal{A}, w)} \Pr(\rho) = 1 - 0 = 1. \qquad \square$$

The following lemma characterizes acceptance of PBA under extremal semantics with restricted ambiguity and is crucial for the constructions in the following sections:

## Lemma 2 (Characterizations for extremal semantics). Let A be a PBA.


Proof. (1.) : For contradiction, assume that every accepting run on w goes through forks infinitely often. But then the probability of every individual accepting run on w is 0. Each run is a measurable event (it is a countable intersection of finite prefixes) and clearly disjoint from other runs, as two different runs must eventually differ after a finite prefix. But as the number of accepting runs is countable by assumption, by σ-additivity it follows that the probability of all accepting runs is also 0, contradicting the fact that <sup>w</sup> <sup>∈</sup> <sup>L</sup><sup>&</sup>gt;<sup>0</sup>(A).

For the other direction, pick a limit-deterministic accepting run ρ of A on w and let uv = w and q ∈ Q such that the state of ρ after reading u is q and there are no forks visited on v. Clearly, the probability to be in q after u in a run of A is positive (because u is finite), and the probability that A continues like ρ from q on v is 1. Hence, the probability of ρ is positive.

(2.) : The (⇐) direction is obvious. We now proceed to show (⇒). Take some time t after which all accepting runs on w separated. Assume that some accepting run ρ is not limit-deterministic. But then ρ goes through infinitely many forks after t which with positive probability lead to a successor from which the probability to accept is 0, and the probability of following ρ is also 0. As the probability to follow ρ until time t is positive, but after that the probability to accept is 0, this implies that there is a positive probability that A rejects w. Therefore, all accepting runs on w must be limit-deterministic. Now assume that some run ρ on w is rejecting. Following this run until the time at which ρ is separated from all accepting runs has positive probability and all continuations must be also rejecting, so A must reject w.

(3.) : Clearly (⇒) holds, because a limit-deterministic rejecting run has positive probability, i.e., if such a run exists on w, then A cannot accept almost surely. For (⇐), observe that because A is flat, we know by Lemma 1 that with probability 1 runs are limit-deterministic. Hence, if there exists no limitdeterministic rejecting run on w (which would have positive probability), then with probability 1 runs are limit-deterministic and accepting. 

Using these characterizations, we can provide simple constructions from probabilistic to classical automata.

Theorem 2. Let A be a PBA that is at most countably ambiguous. Then <sup>L</sup><sup>&</sup>gt;0(A) is a regular language.

Proof (sketch). An NBA construction taking two copies of the PBA, where in the first copy no state is accepting and the second copy has no forks, with the purpose of guessing a limit-deterministic accepting run. 

Corollary 1. If <sup>L</sup><sup>&</sup>gt;0(A) is not regular, then it contains an EDA<sup>F</sup> pattern.

Theorem 3. Let A be a PBA that is at most exponentially ambiguous or flat. Then <sup>L</sup>=1(A) is regular and recognizable by DBA.

Proof (sketch). Both cases (exp. ambiguous or flat) shown using a deterministic breakpoint construction resulting in a DBA. In one case it checks whether all runs are accepting, in the other it checks that there are no limit-deterministic rejecting runs. 

Corollary 2. If <sup>L</sup>=1(A) is not regular, then A contains both an EDA and an IDA<sup>F</sup> pattern.

The corollaries above follow directly from the theorems and the syntactic characterization of ambiguity classes [16]. The following proposition states that these characterizations of regularity in terms of the ambiguity patterns are tight.

(a) q<sup>a</sup> <sup>1</sup> 2 q<sup>b</sup> <sup>1</sup> 2 q<sup>+</sup> q\$ b : 1, a : <sup>1</sup> 2 a : <sup>1</sup> 2 a, b a : 1, b : <sup>1</sup> 2 \$ \$ \$ (b) q<sup>0</sup> q<sup>1</sup> a : 1-λ a : λ b a (c) q<sup>0</sup> q<sup>1</sup> q<sup>2</sup> q<sup>f</sup> a : λ b a : (1-λ) a a : (1-λ) a : λ <sup>b</sup> <sup>Σ</sup>

Fig. 2: (a) Some PWA which accepts the non-regular language { <sup>w</sup> = (a+b)∗\$<sup>ω</sup> <sup>|</sup> #a(w) <sup>&</sup>gt; #b(w) } with a threshold of <sup>1</sup> <sup>2</sup> , where #x(w) denotes the number of occurrences of <sup>x</sup> <sup>∈</sup> <sup>Σ</sup> in <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>. (b) A family of PBA <sup>P</sup><sup>λ</sup> from [4] such that <sup>L</sup><sup>&</sup>gt;0(P<sup>λ</sup>) is not regular for any <sup>λ</sup> <sup>∈</sup> <sup>R</sup>. (c) A family of PWA <sup>P</sup>˜<sup>λ</sup> (closely related to [4, Fig. 6]) such that <sup>L</sup>=1(P˜λ) is not regular for any <sup>λ</sup> <sup>∈</sup> <sup>R</sup>.

#### Proposition 3. There exist PBA...


Proof. (1.) Note that this statement just means that there are PBA accepting non-regular languages, which is well known. For example, the automata family from [4, Fig. 3], depicted in Figure 2(b), accepts non-regular languages under positive semantics and clearly contains an EDA<sup>F</sup> pattern, e.g. there are two different paths from p<sup>0</sup> to p<sup>0</sup> on the word aab.

(2.) The automata family depicted in Figure 2(c) is a simple modification of the PBA family depicted in [4, Fig. 6] and recognizes the same non-regular languages under almost-sure semantics. It does not contain an EDA<sup>F</sup> pattern, because the accepting state is a sink, but it does contain an IDA<sup>F</sup> and an EDA pattern (both e.g. on aab), so it is countably ambiguous and not flat. 

This completes our classification of regular subclasses of PBA under extremal semantics that are defined by ambiguity patterns, showing that going beyond the restricted classes presented above (by allowing more patterns) in general leads to a loss of regularity.

Notice that the presented constructions do not track exact probabilities, just whether transitions have a probability > 0 or = 1. This is a noteworthy observation, as in general, the probabilities do matter for PBA, as shown in [4, Thm. 4.7, Thm. 4.11].

Proposition 4. Let A be a PBA. The exact probabilities in A do not influence <sup>L</sup><sup>&</sup>gt;<sup>0</sup>(A) if <sup>A</sup> is at most countably ambiguous, and <sup>L</sup>=1(A) if <sup>A</sup> is at most exponentially ambiguous or flat.

### 3.3 Threshold Semantics

In this section we consider PBA under threshold semantics and we will see that in this setting, we lose regularity much earlier than in the case of extremal semantics, but there is still the large and natural subclass of finitely ambiguous PBA that retains regularity. Before we can show this, we need to derive a suitable characterization of such languages.

We derive it from the following simple observation, which was also used more implicitly in the proof that Simple HPBA with threshold semantics are equivalent to DBA in [9].

Lemma 3. Let A be a PBA. Then for every threshold λ ∈]0, 1], there exists a finite set of probability values V≥<sup>λ</sup> ⊂ [λ, 1] such that for every finite run prefix with probability v in A we have v ≥ λ ⇒ v ∈ V≥<sup>λ</sup>.

Proof. Observe that given a finite set of real numbers R ⊂ [0, 1], the set R≥<sup>λ</sup> := {r | r = <sup>i</sup> r<sup>i</sup> ≥ λ, r<sup>i</sup> ∈ R} must be finite, as in any sequence p1p<sup>2</sup> ... of p<sup>i</sup> ∈ R, only at most m = logλ(max R) values can be < 1 and such that the product of the sequence remains ≥ λ. In our case, let R be the set of distinct probabilities assigned to edges (including the initial edges) in A. As every finite run prefix by definition has the probability given by the product of the edge probabilities, this implies the statement. 

If there is just one accepting run (i.e., the automaton is unambiguous), one can easily construct a nondeterministic automaton that guesses an accepting run and tracks it along with its probability value, of which there are only finitely many above the threshold. In the case that there are multiple accepting runs, for acceptance only the sum of their probabilities matters. As individual runs can in principle have arbitrarily small probability values, it is not obvious that the same approach (tracking a set of runs) can work. Determining a suitable cut-off point is not as simple, because it is not apparent when a single run becomes so improbable that it does not matter among the others. However, we will now show that such a cut-off point must exist:

Lemma 4. Let <sup>A</sup> be a PBA, <sup>λ</sup> <sup>∈</sup>]0, 1] a threshold and <sup>k</sup> <sup>∈</sup> <sup>N</sup>. There exists <sup>ε</sup><sup>k</sup> <sup>∈</sup> ]0, λ] such that for all sets <sup>R</sup><sup>t</sup> <sup>=</sup> {ρ<sup>t</sup> i}j <sup>i</sup>=1 of at most j ≤ k different run prefixes in <sup>A</sup> of the same length <sup>t</sup> <sup>∈</sup> <sup>N</sup>, Pr(R<sup>t</sup> ) = j <sup>i</sup>=1 Pr(ρ<sup>t</sup> <sup>i</sup>) < λ implies that Pr(R<sup>t</sup> ) < λ − εk.

Proof. We prove this by induction on the number of runs k. For k = 1, i.e. a single run prefix, let V≥<sup>λ</sup> be the finite (by Lemma 3) set of different probability values ≥ λ and let E be the set of distinct probabilities in the automaton A. Then clearly vmax,<λ := max{a · b | a · b < λ, a ∈ V≥<sup>λ</sup>, b ∈ E} is the largest probability value < λ that can correspond to a finite run prefix in A. Hence, we can just pick an ε<sup>1</sup> < λ − vmax,<λ and immediately get that for any run prefix with probability v<λ, we have that v ≤ vmax,<λ < λ − ε1.

Now assume the statement holds for all sets with at most k run prefixes. Let R<sup>t</sup> be a set of k + 1 of different run prefixes of the same length such that Pr(R<sup>t</sup> ) < λ and let ε := εk. Then we know that for every subset S of at most k runs of <sup>R</sup><sup>t</sup> we have Pr(S) < λ <sup>−</sup> <sup>ε</sup>. Also, every single run prefix can by Lemma <sup>3</sup> have one of only finitely many probability values in V≥<sup>ε</sup> that are ≥ ε and there exists a value vmax,<ε denoting the largest possible probability value < ε that a single run prefix can have.

If there exists a run prefix <sup>ρ</sup> <sup>∈</sup> <sup>R</sup><sup>t</sup> with probability value v<ε, then we know that Pr(R<sup>t</sup> ) = Pr(R<sup>t</sup> \ {ρ}) + v < (<sup>λ</sup> <sup>−</sup> <sup>ε</sup>) + <sup>v</sup>max,<ε < λ. If every run in <sup>R</sup><sup>t</sup> has a probability value <sup>≥</sup> <sup>ε</sup>, then every run prefix in <sup>R</sup><sup>t</sup> has as probability one of the values in V≥<sup>ε</sup>. Consider all sums of k values from V≥<sup>ε</sup>, which are finitely many, and pick the largest sum s which is < λ. Choose ε<sup>k</sup>+1 such that ε<sup>k</sup>+1 < min(ε − vmax,<ε, λ − s) to account for both cases. 

From this we can derive the following characterization of languages accepted by finitely ambiguous PBA under threshold semantics:

Lemma 5. Let A be a k-ambiguous PBA and λ ∈]0, 1] a threshold. There exists an <sup>ε</sup> <sup>∈</sup> ]0, λ] such that for all <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>: <sup>w</sup> <sup>∈</sup> <sup>L</sup>>λ(A) iff there exists a set <sup>R</sup> of limit-deterministic accepting runs of A on w with Pr(R) > λ, Pr(S) ≤ λ for all S ⊂ R and at most one run ρ ∈ R with Pr(ρ) < ε.

Proof. Clearly (⇐) holds, as then w is accepted with probability ≥ Pr(R) > λ. We now show (⇒). In a finitely ambiguous PBA there are only finitely many different accepting runs on each word. Furthermore, as after finite time all accepting runs have separated and each accepting run that visits forks infinitely often has probability 0, accepting runs that visit forks infinitely often do not contribute positively to the acceptance probability and thus can be ignored. Hence, if <sup>w</sup> <sup>∈</sup> <sup>L</sup>>λ(A), there is a number of accepting runs that eventually all become deterministic and each such run has a positive probability, which must in total be > λ.

Let R be a set of different limit-deterministic accepting runs of A on w such that Pr(R) > λ and Pr(S) ≤ λ for all S ⊂ R. As there are only finitely many accepting runs, such a set R must exist. Furthermore, notice that each limitdeterministic run has a finite prefix which has the same probability as the whole run, so there exists a time t such that the probability of the set of all different prefixes of runs in R of length t is exactly Pr(R), so that Lemma 4 applies.

Now pick an ε := ε<sup>k</sup> given by Lemma 4. We claim that at most one run ρ ∈ R can have a probability less than ε. If there is no such run in R, we are done. Otherwise let ρ be a run with Pr(ρ) =: p<ε and notice that by choice of R, we have that Pr(R \ {ρ}) =: s ≤ λ. It cannot be the case that s<λ, as then by Lemma 4 we have s<λ − ε, which implies that Pr(R) = s + p<λ, which is a contradiction. Hence, now assume that s = λ. But then, if there is any ρ = ρ ∈ R such that Pr(ρ ) =: p < ε, by the same argument we get the contradiction that s − p < λ − ε and hence s<λ. Therefore, no other run in R can have a probability < ε. 

Now we can perform the intended automaton construction to show:

Theorem 4. <sup>L</sup>>λ(A) is regular for each <sup>k</sup>-ambiguous PBA <sup>A</sup> and <sup>λ</sup> <sup>∈</sup>]0, 1[.

Proof (sketch). We use the characterization of Lemma 5 to construct a generalized Büchi automaton accepting <sup>L</sup>>λ(A). Intuitively, the new automaton just guesses at most k different runs of A and verifies that the guessed runs are limitdeterministic and accepting. The automaton additionally tracks the probability of the runs over time, to determine whether the individual runs and their sum have enough "weight". The automaton rejects when the total probability of the guessed runs is ≤ λ, one of the runs goes into the rejecting sink qrej or a run does not see accepting states infinitely often.

By Lemma 5 we only need to consider sets of runs with at most one run that has a probability < ε, where ε := ε<sup>k</sup> is given by Lemma 4. For this single run we also do not need to track the exact probability value, as its only purpose is to witness that the acceptance probability is strictly greater than λ, whereas all other runs must have one of the finitely many different probabilities which are ≥ ε and must sum to λ. 

This generalizes the corresponding result for PFA [12, Theorem 3]. The proof in [12] uses similar concepts, though a rather different presentation. In the setting of infinite words we additionally have to deal with a single run that has arbitrarily low probability, and we have to ensure that this probability remains positive.

After seeing that finitely ambiguous PBA retain regularity, we show that this is the best we can do under threshold semantics:

Corollary 3. There are polynomially ambiguous PBA A, that is, with an IDA pattern and no EDA,IDA<sup>F</sup> patterns, such that <sup>L</sup>>λ(A) is not regular even for rational thresholds λ ∈]0, 1[.

Proof. Follows from the fact that the PWA A from Figure 2(a), which recognizes a non-regular language (and is used to show Proposition 6), has just an IDA pattern in the underlying NBA, but no EDA or IDA<sup>F</sup> patterns. 

This completes our characterization of languages which are recognized by PBA that are restricted by forbidden ambiguity patterns, so that we can state our main result of this section (see Figure 1 for a visualization):

Theorem 5. The following results hold about PBA with restricted ambiguity:

– <sup>L</sup><sup>&</sup>gt;<sup>0</sup>(k-PBA) = <sup>L</sup><sup>&</sup>gt;<sup>0</sup>(ℵ0-PBA) = <sup>L</sup>(NBA) – <sup>L</sup>=1(k-PBA) = <sup>L</sup>=1(2<sup>k</sup>-PBA) = <sup>L</sup>=1(flat PBA) = <sup>L</sup>(DBA) <sup>⊂</sup> <sup>L</sup>=1(ℵ0-PBA) – <sup>L</sup>>λ(k-PBA) = <sup>L</sup>(NBA) <sup>⊂</sup> <sup>L</sup>>λ(n<sup>k</sup>-PBA)

Proof. The statements follow from the following inclusion chains:

$$\begin{array}{c} \mathbb{L}(\mathsf{NBA}) \stackrel{(1.)}{\subseteq} \mathbb{L}^{>0}(k\mathsf{-}\mathsf{PBA}) \stackrel{def.}{\subseteq} \mathbb{L}^{>0}(\mathsf{N\_{0}\mathsf{-}\mathsf{PBA}}) \stackrel{(2.)}{\subseteq} \mathbb{L}(\mathsf{NBA})\\ \mathbb{L}(\mathsf{DBA}) \stackrel{(3.)}{\subseteq} \mathbb{L}^{=1}(k\mathsf{-}\mathsf{PBA}) \stackrel{def.}{\subseteq} \mathbb{L}^{1}(2^{k}\mathsf{-}\mathsf{PBA}\cup\mathsf{flat}\ \mathsf{PBA}) \stackrel{(4.)}{\subseteq} \mathbb{L}(\mathsf{DBA}) \stackrel{(5.)}{\subset} \mathbb{L}^{=1}(\mathsf{N\_{0}\mathsf{-}\mathsf{PBA}})\\ \mathbb{L}(\mathsf{NBA}) \stackrel{(1.)}{\subseteq} \mathbb{L}^{>0}(k\mathsf{-}\mathsf{PBA}) \stackrel{(6.)}{\subseteq} \mathbb{L}^{>\lambda}(k\mathsf{-}\mathsf{PBA}) \stackrel{(7.)}{\subseteq} \mathbb{L}(\mathsf{NBA}) \stackrel{(8.)}{\subset} \mathbb{L}^{>\lambda}(n^{k}\mathsf{-}\mathsf{PBA}) \end{array}$$

Where the marked relationships hold due to: (1.) Theorem 1, (2.) Theorem 2, (3.) Proposition 2, (4.) Theorem 3, (5.) Proposition 3, (6.) Simple transformation by adding a new accepting sink qacc and modifying the initial distribution μ<sup>0</sup> [4, Lemma 4.16], (7.) Theorem 4, (8.) Corollary 3, and (def.) by definition of the ambiguity-restricted automata classes. 

## 4 Complexity results

In this section, we state some upper and lower bounds on the complexity for deciding emptiness and universality for PBA with restricted ambiguity, derived from the characterizations and constructions presented above.

#### Theorem 6.


Proof. (1. + 2.) : By Theorem <sup>2</sup> the languages of <sup>ℵ</sup>0-PBA<sup>&</sup>gt;<sup>0</sup> are regular. The construction of an NBA just uses two copies of the given PBA. For emptiness, it thus suffices to guess an accepted ultimately periodic word and verify that it is accepted by the NBA, which can be done in NL. Since universality for NBA in in PSPACE [21], we also obtain (2.).

(3.): If the automaton is at most exponentially ambiguous, there are only finitely many accepting runs on each word and as we know by Lemma 2 that <sup>w</sup> <sup>∈</sup> <sup>L</sup>=1(A) iff all runs are accepting, it suffices to guess a rejecting run in A-, which implies that the ultimately periodic word w labelling that run can not be in <sup>L</sup>=1(A). If the automaton is flat, then we know that for each rejected word there must exist a limit-deterministic rejecting run in the underlying NBA, which we also can guess. 


Table 1: Summary of main results from Theorems 5 and 6 concerning PBA with ambiguity restrictions. The completeness results follow from the hardness results for HPBA (which are subsumed by flat PBA) from [8, Section 5], the PSPACE inclusion of universality for almost-sure ℵ0-PBA follows from [8, Theorem 4.4].

Observe that <sup>ℵ</sup>0-PBA<sup>&</sup>gt;<sup>0</sup> subsume HPBA<sup>&</sup>gt;<sup>0</sup> and the union of flat PBA=1 and exp. ambiguous PBA=1 subsumes HPBA=1, while preserving the same complexity of the emptiness and universality problems. A summary of the main results from Theorem 5 and Theorem 6 is presented in Table 1.

We conclude with an observation relevant to the question about feasibility of PBA with restricted ambiguity for the purpose of application in e.g. modelchecking or synthesis.

### Proposition 5 (Relationship to classical formalisms).


Proof. It is known [20, Theorem 2] that there is a doubly-exponential lower bound from LTL to LDBA. It is also known that LTL to NBA has an exponential lower bound (e.g. [5, Theorem 5.42]), which implies an exponential lower bound from NBA to LDBA.

By Theorem 2 there is a polynomial transformation from countably ambiguous PBA with positive semantics into LDBA, which together with the aforementioned bounds implies the claimed lower bounds. 

## 5 Weakness in Probabilistic Büchi Automata

In this section we investigate the class of probabilistic weak automata (PWA), establishing the relation between different classes defined by PWA as shown in Figure 3 (see also the description of our contribution in the introduction).

As a first remark, notice that PWA can be "complemented" by inverting accepting and rejecting states and switching between dual semantics, e.g., for a PWA <sup>A</sup> we have <sup>L</sup><sup>&</sup>gt;0(A) = <sup>L</sup>=1(A), where <sup>A</sup> is just <sup>A</sup> with inverted accepting state set F = Q \ F.

Since the overarching theme of this paper is trying to find regular subclasses of PBA, we will next establish the following result, showing that there is no hope to find a complete syntactical characterization of regularity in PBA:

Theorem 7. The regularity of PWA (and therefore of PBA) under positive, almost-sure and threshold semantics is an undecidable problem.

Proof (sketch). Since <sup>L</sup>>λ(PWA) <sup>⊇</sup> <sup>L</sup><sup>&</sup>gt;<sup>0</sup>(PWA) (see Theorem 10), <sup>L</sup><sup>&</sup>gt;<sup>0</sup>(PWA) = L=1(PWA), and the class of regular ω-languages is closed under complement, it suffices to show the statement for PWA=1. We do this by reduction from the value 1 problem for PFA, which is the question whether for each ε > 0 there exists a word accepted by the PFA with probability > 1−ε. This problem is known to be undecidable [13]. We consider a slightly modified version of the problem by assuming that no word is accepted with probability 1 by the given PFA. The problem remains undecidable under this assumption, because one can check if a PFA accepts a finite word with probability 1 by a simple subset construction.

Given some PFA <sup>A</sup>, we construct a PWA=1 <sup>B</sup> by taking a copy of <sup>A</sup> and extending it with a new symbol # such that from accepting states of A the automaton is "restarted" on #, while from non-accepting states # leads into a

Fig. 3: Illustration of relationships between the class of languages accepted by weak probabilistic automata under various semantics with other already known classes. The overlapping patterns indicate intersection of classes, where dots mark L<sup>&</sup>gt;0(PBA), and different diagonal lines respectively L=1(PBA) and L=1(PBA). The dashed line indicates intersections with different subclasses of regular languages. The class L>λ(PBA) contains all the other depicted classes, L>λ(PWA) contains the area inside the thick line. The depicted fact that <sup>L</sup><sup>&</sup>gt;0(PWA) = <sup>L</sup>>λ(PWA) <sup>∩</sup> <sup>L</sup><sup>&</sup>gt;0(PBA) is a conjecture, one direction is shown in Theorem 10.

new part which ensures that infinitely many # are seen and contains the only accepting state of <sup>B</sup>. We show that <sup>L</sup>=1(B)=(Σ∗#)<sup>ω</sup> \ <sup>R</sup>, where <sup>R</sup> <sup>=</sup> <sup>∅</sup> if <sup>A</sup> does not have value 1, and R is non-empty but does not contain an ultimately periodic word, otherwise. This implies that <sup>L</sup>=1(B) is regular iff <sup>A</sup> does not have value 1. 

We will now show that PWA with almost-sure semantics are as expressive as PBA, and with positive semantics as expressive as PCA.

## Theorem 8. L<sup>&</sup>gt;<sup>0</sup>(PWA) = L<sup>&</sup>gt;<sup>0</sup>(PCA) and L=1(PWA) = L=1(PBA).

Proof (sketch). It suffices to show the first statement. The second then follows by duality, i.e., we can interpret a PBA=1 <sup>A</sup> recognizing <sup>L</sup> as a PCA<sup>&</sup>gt;<sup>0</sup> recognizing <sup>L</sup> and just apply the construction to get a PWA<sup>&</sup>gt;<sup>0</sup> <sup>B</sup> for <sup>L</sup>, such that <sup>B</sup> (with inverted accepting and rejecting states) is a PWA=1 for <sup>L</sup>. In the first statement the ⊆ inclusion is trivial, hence we only need to show that <sup>L</sup><sup>&</sup>gt;<sup>0</sup>(PCA) <sup>⊆</sup> <sup>L</sup><sup>&</sup>gt;<sup>0</sup>(PWA).

We construct a PWA<sup>&</sup>gt;<sup>0</sup> consisting of two copies of the original PCA<sup>&</sup>gt;<sup>0</sup>, a guess copy and a verify copy. In the first copy, the automaton can guess that no final states will be visited anymore and switch to the verify copy, which is accepting, but where all transitions into final states are redirected to a rejecting sink. 

Next, we show that languages that can be accepted by both, a PWA with almost-sure semantics, and by a PWA with positive semantics, are regular and can be accepted by a DWA. For the proof, we rely on a characterization of DWA languages in terms of the Myhill-Nerode equivalence relation from [22]. So we first define this equivalence, and show that languages defined by PBA with positive semantics have only finitely many equivalence classes. Then we come back to the result for PWA.

For <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup>ω, define the Myhill-Nerode equivalence relation <sup>∼</sup>L<sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> <sup>×</sup>Σ<sup>∗</sup> by <sup>u</sup> <sup>∼</sup><sup>L</sup> <sup>v</sup> iff uw <sup>∈</sup> <sup>L</sup> <sup>⇔</sup> vw <sup>∈</sup> <sup>L</sup> for all <sup>w</sup> <sup>∈</sup> <sup>Σ</sup>ω. Then the following holds:

#### Lemma 6 (Finitely many Myhill-Nerode classes).

Languages in L<sup>&</sup>gt;0(PBA) have finitely many Myhill-Nerode equivalence classes.

Proof. Let <sup>A</sup> = (Q, Σ, δ, μ0, F) be some PBA<sup>&</sup>gt;<sup>0</sup> and <sup>u</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> some word and let μ<sup>u</sup> := δ∗(μ0, u) be the probability distribution on states of A after reading u. Pick any <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> and notice that uw <sup>∈</sup> <sup>L</sup> <sup>=</sup> <sup>L</sup><sup>&</sup>gt;0(A) iff there exists some state q such that μu(q) > 0 and the probability to accept w from q is also > 0, as the product of two positive numbers clearly still is positive. But then, for any two u, v ∈ Σ<sup>∗</sup> we have that whenever μu(q) > 0 ⇔ μv(q) > 0 for all q, then we have uw <sup>∈</sup> <sup>L</sup> <sup>⇔</sup> vw <sup>∈</sup> <sup>L</sup> for all <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> by the reasoning above, as the exact value does not matter for acceptance, and therefore u ∼<sup>L</sup> v. But as there are only at most 2|Q<sup>|</sup> different possibilities how values in a distribution μ over Q are either equal to or greater than 0, this is an upper bound on the number of different equivalence classes. 

## Theorem 9. <sup>L</sup><sup>&</sup>gt;0(PWA) <sup>∩</sup> <sup>L</sup>=1(PWA) = <sup>L</sup>(DWA) = <sup>L</sup>(PWA0/1)

Proof. The inclusions <sup>L</sup>(DWA) <sup>⊆</sup> <sup>L</sup>(PWA0/1) <sup>⊆</sup> <sup>L</sup><sup>&</sup>gt;0(PWA) <sup>∩</sup> <sup>L</sup>=1(PWA) are trivial, hence it remains to show <sup>L</sup><sup>&</sup>gt;0(PWA) <sup>∩</sup> <sup>L</sup>=1(PWA) <sup>⊆</sup> <sup>L</sup>(DWA).

So let <sup>L</sup> be a language from <sup>L</sup><sup>&</sup>gt;0(PWA) <sup>∩</sup> <sup>L</sup>=1(PWA). We want to show that L can be accepted by a DWA. We use the following characterization of DWA languages [22, Theorem 21]: The DWA languages are precisely the languages with finitely many Myhill-Nerode classes in the class G<sup>δ</sup> ∩ F<sup>σ</sup> in the Borel hierarchy. The classes G<sup>δ</sup> and F<sup>σ</sup> of the Borel hierarchy are often also referred to as Π<sup>2</sup> and Σ2. We do not introduce the details of this hierarchy here, but rather refer the reader not familiar with these concepts to [22] and [8].

We already know that L has finitely many Myhill-Nerode classes by Lemma 6 (as PWA are special cases of PBA). It remains to show that L is in the class G<sup>δ</sup> ∩ Fσ. It is known that PBA with almost-sure semantics define languages in G<sup>δ</sup> [8, Lemma 3.2]. Hence L is in Gδ. Since L is accepted by a PWA with positive semantics, the complement of L is accepted by a PWA with almostsure semantics (as noted at the beginning of this section). We obtain that the complement of L is also in G<sup>δ</sup> again by [8, Lemma 3.2]. This means that L is in Fσ, which by definition consists of the complements of languages from Gδ. 

Concluding this section, we show a result about weak automata with threshold semantics, which (not surprisingly) turn out to be even more expressive. A careful analysis of the PWA A in Fig. 2(a) shows the following result:

Proposition 6. For all thresholds λ ∈]0, 1[ there exists a PWA A such that <sup>L</sup>>λ(A) is not regular and not PBA<sup>&</sup>gt;<sup>0</sup> recognizable.

Putting things together, we can say the following about threshold PWA, establishing the relation of L>λ(PWA) to the other classes in Figure 3:

#### Theorem 10 (Expressive power of threshold PWA).


Proof. (1.) <sup>L</sup><sup>&</sup>gt;0(PWA) <sup>⊆</sup> <sup>L</sup><sup>&</sup>gt;0(PBA) by definition and <sup>L</sup><sup>&</sup>gt;0(PWA) <sup>⊆</sup> <sup>L</sup>>λ(PWA), as any PWA<sup>&</sup>gt;<sup>0</sup> can be modified to a PWA>λ recognizing the same language by just adding an additional accepting sink and modifying the initial distribution, just as described in [4, Lemma 4.16] for general PBA.

(2.) By Proposition 6, there are languages recognized by PWA>λ that cannot be recognized with PBA<sup>&</sup>gt;<sup>0</sup>. To show that there are languages accepted by PBA<sup>&</sup>gt;<sup>0</sup> that cannot be accepted by PWA>λ we can give a topological characterization of languages accepted by PWA by a simple adaptation of [8, Lemma 3.2] and combine it with other results shown in [8] to show that there are PBA<sup>&</sup>gt;<sup>0</sup> that accept languages that cannot be accepted by PWA>λ.

(3.) The first inclusion was discussed in (1.), the strictness follows from Proposition <sup>6</sup> and the fact that <sup>L</sup><sup>&</sup>gt;0(PWA) = <sup>L</sup>=1(PBA) <sup>⊂</sup> BCl(L=1(PBA)) = L<sup>&</sup>gt;0(PBA), where the first equality is Theorem 8 and the second is shown in [8]. The second inclusion of the statement follows from (2.) and the fact from [4] that <sup>L</sup><sup>&</sup>gt;0(PBA) <sup>⊂</sup> <sup>L</sup>>λ(PBA). 

For the dual class L≥<sup>λ</sup>(PWA) one can show symmetric results that correspond to statements (1.) and (2.) above, for statement (3.) however there is no proof yet for the strictness of the inclusions (especially the second one), whereas the statement <sup>L</sup>=1(PWA) <sup>⊆</sup> <sup>L</sup>≥<sup>λ</sup>(PWA) <sup>⊆</sup> <sup>L</sup>≥<sup>λ</sup>(PBA) is obvious. We leave this issue as an open question. Another interesting question is whether > λ is equivalent to < λ (or dually for ≥ / ≤).

## 6 Conclusion

By using notions from ambiguity in classical Büchi automata, we were able to extend the set of easily (syntactically) checkable PBA which are regular under some or all of the usual semantics. As a consequence, ambiguity appears to be an even more interesting notion in the probabilistic setting, as here it in fact has consequences for the expressive power of automata, whereas in the classical setting there is no such effect. Our results also indicate that to get non-regularity, one requires the use of certain structural patterns which at least imply the existence of the ambiguity patterns that we used. It is an open question whether it is possible to identify more fine-grained syntactic characterizations, patterns or easily checkable properties which are just over-approximated by the ambiguity patterns and are required for non-regularity.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Local Local Reasoning: A BI-Hyperdoctrine for Full Ground Store**‹

Miriam Polzer and Sergey Goncharov(-)

FAU Erlangen-N¨urnberg, Erlangen, Germany {miriam.polzer,sergey.goncharov}@fau.de

**Abstract.** Modelling and reasoning about dynamic memory allocation is one of the well-established strands of theoretical computer science, which is particularly well-known as a source of notorious challenges in semantics, reasoning, and proof theory. We capitalize on recent progress on categorical semantics of *full ground store*, in terms of a *full ground store monad*, to build a corresponding semantics of a higher order logic over the corresponding programs. Our main result is a construction of an *(intuitionistic) BI-hyperdoctrine*, which is arguably the semantic core of higher order logic over local store. Although we have made an extensive use of the existing generic tools, certain principled changes had to be made to enable the desired construction: while the original monad works over total heaps (to disable dangling pointers), our version involves partial heaps (*heaplets*) to enable compositional reasoning using separating conjunction. Another remarkable feature of our construction is that, in contrast to the existing generic approaches, our BI-algebra does not directly stem from an internal categorical partial commutative monoid.

## **1 Introduction**

Modelling and reasoning about dynamic memory allocation is a sophisticated subject in denotational semantics with a long history (e.g. [19,15,14,16]). Denotational models for dynamic references vary over a large spectrum, and in fact, in two dimensions: depending on the expressivity of the features being modelled (*ground store* – *full ground store* – *higher order store*) and depending on the amount of *intensional* information included in the model (*intensional* – *extensional*), using the terminology of Abramsky [1].

Recently, Kammar et al [9] constructed an extensional monad-based denotational model of the *full ground store*, i.e. permitting not only memory allocation for discrete values, but also storing mutually linked data. The key idea of the latter work is an explicit delineation between the target presheaf category r**W**, **Set**s on which the full ground store monad acts, and an auxiliary presheaf category r**E**, **Set**s of *initializations*, naturally hosting a *heap functor* H. The latter category also hosts a *hiding monad* P, which can be loosely understood as a semantic

<sup>‹</sup> Sergey Goncharov acknowledges support by German Research Foundation (DFG) under project GO 2161/1-2.

c The Author(s) 2020

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 542–561, 2020. https://doi.org/10.1007/978-3-030-45231-5\_28

Fig. 1: Construction of the full ground store monad.

mechanism for idealized garbage collection. The full ground store monad is then assembled according to the scheme given in Fig. 1. As a slogan: the *local* store monad is a *global* store monad transform of the hiding monad sandwiched within a geometric morphism.

The fundamental reason, why extensional models of local store involve intricate constructions, such as presheaf categories is that the desirable program equalities include

$$\begin{aligned} \text{let } \ell &\coloneqq \mathsf{new} \,\,\, v ; \,\ell' &\coloneqq \mathsf{new} \,\, w \,\,\text{in} \, p &= \quad \mathsf{let } \,\, \ell' &\coloneqq \mathsf{new} \,\, w ; \,\ell &\coloneqq \mathsf{new} \,\, v \,\,\text{in} \, p & (\ell \neq \ell')\\ \text{let } \ell &\coloneqq \mathsf{new} \,\, v \,\,\text{in} \, \mathsf{ret} \,\, \mathsf{t} &= \quad \mathsf{ret} \,\, \mathsf{t} \\ \text{let } \ell &\coloneqq \mathsf{new} \, v \,\,\text{in} \, (\mathsf{if} \,\, \ell = \ell' \,\, \mathsf{then} \,\, \mathsf{true} \,\, \mathsf{else} \,\, \mathsf{false}) &= \quad \mathsf{false} & (\ell \neq \ell') \end{aligned}$$

and these jointly do not have set-based models over countably infinite sets of locations [23, Proposition 6]. The first equation expresses irrelevance of the memory allocation order, the second expresses the fact that an unused cell is always garbage collected and the third guarantees that allocation of a fresh cell does indeed produce a cell different from any other. The aforementioned construction validates these equations and enjoys further pleasant properties, e.g. soundness and adequacy of a higher order language with user defined storable data structures.

The goal of our present work is to complement the semantics of programs over local store with a corresponding principled semantics of *higher order logic*. In order to be able to specify and reason modularly about local store, more specifically, we seek a model of higher order *separation logic* [21]. It has been convincingly argued in previous work on categorical models of separation logic [2,3] that a core abstraction device unifying such models is a notion of *BI-hyperdoctrine*, extending Lawvere's hyperdoctrines [10], which provide a corresponding abstraction for the first order logic. BI-hyperdoctrines are standardly built on *BI-algebras*, which are also standardly constructed from *partial commutative monoids (pcm)*, or more generally from *resource algebras* as in the Iris state of the art advanced framework for higher order separation logic [8]. One subtlety our construction reveals is that it does not seem to be possible to obtain a *BI-algebra* following general recipes from a pcm (or a resource algebra), due to the inherent local nature of the storage model, which does not allow one to canonically map store contents into a global address space. Another subtlety is that the devised logic is necessarily non-classical, which is intuitively explained by the fact that the semantics of programs must be suitably irrelevant to garbage collection, and in

our case this follows from entirely formal considerations (Yoneda lemma). It is also worth mentioning that for this reason the logical theory that we obtain is incompatible with the standard (classical or intuitionistic) predicate logic. E.g. the formula D-. - Ñ 5 is always valid in our setup, which expresses the fact that a heap *potentially* contains a cell equal to 5 (which need not be reachable) – this is in accord with the second equation above – and correspondingly, the formula @-. p- Ñ 5q is unsatisfiable. This and other similar phenomena are explained by the fact that our semantics essentially behaves as a Kripke semantics along two orthogonal axes: (proof relevant) *cell allocation* and (proof irrelevant) *cell accessibility*. While the latter captures a *programming* view of locality, the latter captures a *reasoning* view of locality, and as we argue (e.g. Example 26), they are generally mutually irreducible.

**Related previous work** As we already pointed out, we take inspiration from the recent categorical approaches to modelling program semantics for dynamic references [9], as well as from higher order separation logic semantic frameworks [2]. Conceptually, the problem of combining separation logic with garbage collection mechanisms goes back to Reynolds [20], who indicated that standard semantics of separation logic in not compatible with garbage collection, which we also reinforce with our construction. Calcagno et al [4] addressed this issue by providing two models. The first model is based on total heaps, featuring the aforementioned effect of "potential" allocations. To cope with heap separation the authors introduced another model based on partial heaps, in which this effect again disappears, and has to be compensated by syntactic restrictions on the assertion language.

**Plan of the paper** After preliminaries (Section 2), we give a modified presentation of a call-by-value language with full ground references and the full ground store monad (Sections 3 and 4) following the lines of [9]. In Section 5 we provide some general results for constructing semantics of higher order separation logics. The main development starts in Section 6 where we provide a construction of a BI-hyperdoctrine. We show some example illustrating our semantics in Section 7 and draw conclusions in Section 8.

## **2 Preliminaries**

We assume basic familiarity with the elementary concepts of category theory [12,6], all the way up to monads, toposes, (co)ends and Kan extensions. We denote by |**C**| the class of objects of a category **C**; we often suppress subscripts of natural transformation components if no confusion arises.

In this paper, we work with special kinds of *covariant presheaf toposes*, i.e. functor categories of the form r**C**, **Set**s, where **C** is small and satisfies the following *amalgamation condition*: for any f : a Ñ b and g : a Ñ c there exist g<sup>1</sup> : b Ñ d and f<sup>1</sup> : c Ñ d such that f<sup>1</sup> ˝ g " g<sup>1</sup> ˝ f. Such toposes are particularly well-behaved, and fall into the more general class of *De Morgan* toposes [7]. As presheaf toposes, De Morgan toposes are precisely characterized by the condition

$$\begin{array}{llll} \mathsf{P} \mathsf{(put)} & \varGamma \vdash\_{\mathsf{v}} \ell \colon \mathsf{Ref}\_{S} & \varGamma \vdash\_{\mathsf{v}} v \colon \mathsf{C} \mathsf{Type}(S) & & \qquad \mathsf{(get)} & \varGamma \vdash\_{\mathsf{v}} \ell \colon \mathsf{Ref}\_{S} \\ & \Gamma \vdash\_{\mathsf{c}} \ell \colon \coloneqq v \colon 1 & & & \\ & & \Gamma, \ell\_{1} \colon \mathsf{Ref}\_{S\_{1}}, \ldots, \ell\_{n} \colon \mathsf{Ref}\_{S\_{n}} \vdash\_{\mathsf{v}} v\_{1} \colon \mathsf{C} \mathsf{Type}(S\_{1}) \\ & & & & \vdots \\ & & & \Gamma, \ell\_{1} \colon \mathsf{Ref}\_{S\_{1}}, \ldots, \ell\_{n} \colon \mathsf{Ref}\_{S\_{n}} \vdash\_{\mathsf{v}} v\_{n} \colon \mathsf{C} \mathsf{Type}(S\_{n}) \\ & & & \mathsf{(new)} & \varGamma, \ell\_{1} \colon \mathsf{Ref}\_{S\_{1}}, \ldots, \ell\_{n} \colon \mathsf{Ref}\_{S\_{n}} \vdash\_{\mathsf{c}} p \colon A \\ \mathsf{(new)} & & & \Gamma \vdash\_{\mathsf{c}} \mathsf{letset} \; \ell\_{1} \colon \mathsf{v}\_{1}, \ldots, \ell\_{n} \colon \mathsf{x} \colon v\_{n} \text{ in } p \colon A \\ \end{array}$$

#### Fig. 2: Term formation rules for memory management constructs.

that 2 " 1 ` 1 is a retract of the subobject classifier Ω. More specifically, our **C** support further useful structure, in particular, a strict monoidal tensor ' with jointly epic injections in1, in2, forming an *independent coproduct* structure, as recently identified by Simpson [22]. Moreover, if the coslices c Ñ **C** support independent products, we obtain *local independent coproducts* in **C**, which are essentially cospans <sup>c</sup><sup>1</sup> <sup>Ñ</sup> <sup>c</sup><sup>1</sup> 'c <sup>c</sup><sup>2</sup> <sup>Ð</sup> <sup>c</sup><sup>2</sup> in <sup>c</sup> Ñ**C**. Given ρ<sup>1</sup> : c Ñ c<sup>1</sup> and ρ<sup>2</sup> : c Ñ c2, we thus always have <sup>ρ</sup><sup>1</sup> ' <sup>ρ</sup><sup>2</sup> : <sup>c</sup><sup>1</sup> <sup>Ñ</sup> <sup>c</sup><sup>1</sup> 'c <sup>c</sup><sup>2</sup> and <sup>ρ</sup><sup>2</sup> ' <sup>ρ</sup><sup>1</sup> : <sup>S</sup><sup>2</sup> <sup>Ñ</sup> <sup>c</sup><sup>1</sup> 'c <sup>c</sup>2, such that pρ<sup>1</sup> ' ρ2q ˝ ρ<sup>1</sup> " pρ<sup>2</sup> ' ρ1q ˝ ρ2, and as a consequence, r**C**, **Set**s is a De Morgan topos. Intuitively, the category **C** represents worlds in the sense of *possible world semantics* [15,19]. A morphism ρ: a Ñ b witnesses the fact that b is a *future* world w.r.t. a. Existence of local independent products intuitively ensures that diverse futures of a given world can eventually be unified in a canonical way.

Every functor f: **C** Ñ **D** induces a functor f ‹ : r**D**, **Set**sÑr**C**, **Set**s by precomposition with f. By general considerations, there is a right adjoint f‹ : r**C**, **Set**sÑr**D**, **Set**s, computed as Ranf, the right Kan extension along f. This renders the adjunction f ‹ % f‹, as a *geometric morphism*, in particular, f ‹ preserves all finite limits.

## **3 A Call-by-Value Language with Local References**

To set the context, we consider the following higher order language of programs with local references by slightly adapting the language of Kammar et al [9] to match with the *fine-grain call-by-value* perspective [11]. This allows us to formally distinguish *pure* and *effectful* judgements. First, we postulate a collection of *cell sorts* S and then introduce further types with the grammar:

$$A, B \dots \dots \equiv 0 \mid 1 \mid A \times B \mid A + B \mid A \to B \mid \text{Ref}\_S \tag{1} \tag{1}$$

A type is *first order* if it does not involve the function type constructors A Ñ B. We then fix a map CType, assigning a first order type to every given sort from S. We show three term formation rules over these data in Fig. 2 specific to local store. Here the v-indices at the turnstiles indicate *values* and the c-indices indicate *computations*. In **(put)** the cell referenced by is updated with a value v, **(get)** returns a value under the reference and **(new)** simultaneously allocates new cells filled with the values <sup>v</sup>1,...,vn and makes them accessible in <sup>p</sup> under the corresponding references -1,...,n. A fine-grain call-by-value language is interpreted standardly in a category with a monad, which in our case must additionally provide a semantics to the rules **(put)**, **(get)** and **(new)**. We present this monad in detail in the next section.

**Example 1 (Doubly Linked Lists).** Let S " {*DLList*} and let CTypep*DLList*q " 2ˆ pRefDLList `1qˆpRefDLList `1q, which indicates that a list element is a Boolean (i.e. an element of 2 " 1 ` 1) and two pointers (forwards and backwards) to list elements, each of which may be missing. Note that we thus avoid empty lists and null-pointers: every list contains at least one element, and the elements added by `1 cannot be dereferenced. This example provides a suitable illustration for the letref construct. E.g. the program

$$\text{letref } \ell\_1 \text{ := (0, \text{inr} \star, \text{inl } \ell\_2); \ell\_2 \text{ := (1, \text{inl } \ell\_1, \text{inr } \star) \text{ in } \text{ret } \ell\_1.$$

simultaneously creates two list elements pointing to each other and returns a reference to the first one.

## **4 Full Ground Store in the Abstract**

We proceed to present the full ground store monad by slightly tweaking the original construction [9] towards higher generality. The main distinction is that we do not recur to any specific program syntax and proceed in a completely axiomatic manner in terms of functors and natural transformations. This mainly serves the purpose of developing our logic in Section 6, which will require a coherent upgrade of the present model. Besides this, in this section we demonstrate flexibility of our formulation by showing that it also instantiates to the model previously developed by Plotkin and Power [16] (Theorem 8).

Our present formalization is parametric in three aspects: the set of *sorts* S, the set of *locations* L and a map range, introduced below for interpreting S. We assume that <sup>L</sup> is canonically isomorphic to the set of natural numbers <sup>N</sup> under #: <sup>L</sup> – <sup>N</sup>. Using this isomorphism, we commonly use the "shift of - P L by <sup>n</sup> <sup>P</sup> <sup>N</sup>", defined as follows: - ` n " #-<sup>1</sup>p#-` nq.

**Heap layouts and abstract heap(let)s** Let **W** be a category of *(heap) layouts* and injections defined as follows: an object w P |**W**| is a finitely supported partial function w: L áfin S and a morphism ρ: w Ñ w<sup>1</sup> is a type preserving injection ρ: dom w Ñ dom w<sup>1</sup> , i.e. for all l P img w, wpq " w<sup>1</sup> pρpqq. We will equivalently view w as a left-unique subset of L ˆ S and hence use the notation p- : Sq P w as an equivalent of wpq " S. Injections ρ: w Ñ w<sup>1</sup> with the property that wp- : Sq " - : S for all p- : Sq P w we also call *inclusions* and write w Ď w<sup>1</sup> instead of ρ: w Ñ w<sup>1</sup> , for obviously there is at most one inclusion from w to w<sup>1</sup> . If w Ď w<sup>1</sup> then we call w a *sublayout* of w<sup>1</sup> . We next postulate

$$\mathsf{range} \colon \mathcal{S} \to [\mathbf{W}, \mathbf{Set}].$$

The idea is, given a sort S P S and a heap layout w P |**W**|, rangepSqpwq yields the set of possible values for cells of type S over w.

**Example 2.** Assuming the grammar (1) and a corresponding map CType, a generic type A is interpreted as a presheaf A: **W** Ñ **Set**, by obvious structural induction, e.g. <sup>A</sup> <sup>ˆ</sup> <sup>B</sup> " <sup>A</sup>ˆB, except for the clause for Ref, for which <sup>p</sup>RefSq<sup>w</sup> " w-1pSq. This yields the following definition for range: rangepSq " CTypepSq [9].

**Example 3 (Simple Store).** By taking <sup>S</sup> " {‹}, <sup>L</sup> " <sup>N</sup> (natural numbers) and rangep‹qpwq " V where V is a fixed set of *values*, we essentially obtain the model previously explored by Plotkin and Power [16]. We reserve the term *simple store* for this instance. Simple store is a ground store (since range is a constant functor), moreover this store is untyped (since S " {‹}) and the locations L are precisely the natural numbers.

A *heap* over a layout w assigns to each p- : Sq P w an element from rangepSqpwq. More generally, a *heaplet* over w assigns an element from rangepSqpwq to *some*, possibly not all, p- : Sq P w. We thus define the following *heaplet bi-functor* <sup>H</sup>: **<sup>W</sup>**op <sup>ˆ</sup> **<sup>W</sup>** <sup>Ñ</sup> **Set**:

$$\mathcal{H}(w^-, w^+) = \prod\_{(\ell \colon S) \in w^-} \mathsf{range}(S)(w^+)^\cdot$$

and identify the elements of Hpw´, w`q with heaplets and the elements of Hpw, wq with heaps. Of course, we intend to use Hpw´, w`q for such w´ and w` that the former is a sublayout of the latter. The contravariant action of H is given by projection and the covariant action is induced by functoriality of rangepSq.

$$\begin{aligned} \mathfrak{pr}\_{\begin{pmatrix} \ell \colon & S \end{pmatrix}}(\mathcal{H}(w^-, \rho\_1 \colon w\_1^+ \to w\_2^+) (\eta \in \mathcal{H}(w^-, w\_1^+))) &= \mathsf{range}(S) (\rho\_1) (\mathfrak{pr}\_{\begin{pmatrix} \ell \colon & S \end{pmatrix}} \eta) \\ \mathfrak{pr}\_{\begin{pmatrix} \ell \colon & S \end{pmatrix}}(\mathcal{H}(\rho\_2 \colon w\_2^- \to w\_1^-, w^+) (\eta \in \mathcal{H}(w\_1^-, w^+))) &= \mathfrak{pr}\_{\begin{pmatrix} \ell \colon & S \end{pmatrix}} \eta \end{aligned}$$

The heaplet functor preserves independent coproduct, we overload the ' operation with the isomorphism ': Hpw1, wq ˆ Hpw2, wq – Hpw<sup>1</sup> ' w2, wq.

**Example 4.** For illustration, consider the following simplistic example. Let S " {*Int*, RefInt, RefRefInt , ... } where *Int* is meant to capture the ground type of integers and recursively, RefA is the type of pointers to <sup>A</sup>. Then, we put

$$\mathsf{range}(Int)(w) = \mathbb{Z}, \quad \mathsf{range}(\mathsf{Ref}\_S)(w) = w^{\mathsf{-1}}(S) = \{\ell \in \mathsf{dom}\, w \mid w(\ell) = S\}.$$

For a heaplet example, consider w´ " {-<sup>1</sup> : *Int*, -<sup>2</sup> : RefInt} and w` " {-<sup>1</sup> : *Int*, -<sup>2</sup> : RefInt, -<sup>3</sup> : *Int*}. Hence, w´ is a sublayout of w`. By viewing the elements of Hpw´, w`q as lists of assignments on w´, we can define s1, s<sup>2</sup> P Hpw´, w`q as follows: s<sup>1</sup> " r-<sup>1</sup> : *Int* ÞÑ 5, -<sup>2</sup> : Refint ÞÑ -<sup>1</sup>s, s<sup>2</sup> " r-<sup>1</sup> : *Int* ÞÑ 3, -<sup>2</sup> : Refint ÞÑ -<sup>3</sup>s. The heaplets s<sup>1</sup> and s<sup>2</sup> can be graphically presented as follows:

Fig. 3: Local independent coproduct

The category **W** supports (local) independent coproducts described in Section 2. These are constructed as follows. For w, w<sup>1</sup> P |**C**|, w ' w<sup>1</sup> " w Y {- ` n ` 1: S | p-, cq P w<sup>1</sup> } with <sup>n</sup> being the largest index for which <sup>w</sup> is defined on #-1pnq. This yields a strict monoidal structure ': **W** ˆ **W** Ñ **W**. Intuitively, w<sup>1</sup> ' w<sup>2</sup> is a canonical disjoint sum of w<sup>1</sup> and w2, but note that ' is not a coproduct in **W** (e.g. there is no ∇: 1 ' 1 Ñ 1, for **W** only contains injections). For every ρ: w<sup>1</sup> Ñ w2, there is a canonical complement ρ<sup>A</sup> : w<sup>2</sup> a ρ Ñ w<sup>2</sup> whose domain <sup>w</sup><sup>2</sup> <sup>a</sup> <sup>ρ</sup> " <sup>w</sup><sup>2</sup> img ρ consists of all such cells p- : Sq P w<sup>2</sup> that ρ misses. Given two morphisms ρ<sup>1</sup> : w Ñ w<sup>1</sup> and ρ<sup>2</sup> : w Ñ w2, we define the local independent coproduct <sup>w</sup><sup>1</sup> 'w <sup>w</sup><sup>2</sup> as the layout consisting of the locations from <sup>w</sup>, and the ones from w<sup>1</sup> and w<sup>2</sup> which are neither in the image of ρ<sup>1</sup> nor in the image of ρ2:

$$
\rho\_1 \oplus\_w \rho\_2 = w \oplus (w\_1 \ominus \rho\_1) \oplus (w\_2 \ominus \rho\_2).
$$

There are morphisms <sup>w</sup><sup>1</sup> <sup>ρ</sup><sup>1</sup> ' <sup>ρ</sup><sup>2</sup> <sup>ρ</sup><sup>1</sup> 'w <sup>ρ</sup><sup>2</sup> and <sup>w</sup><sup>2</sup> <sup>ρ</sup><sup>2</sup> ' <sup>ρ</sup><sup>1</sup> <sup>ρ</sup><sup>1</sup> 'w <sup>ρ</sup><sup>2</sup> such that

$$\begin{array}{c} w \\ \stackrel{\rho\_1}{\underset{w\_1}{\rightleftharpoons}} \begin{array}{c} \rho\_2\\ \downarrow \\ \hline \end{array} \begin{array}{c} w\_2\\ \downarrow \\ \hline \end{array} \begin{array}{c} \begin{array}{c} w\_2\\ \downarrow \\ \hline \end{array} \begin{array}{c} \begin{array}{c} \\ w\_1 \oplus \rho\_1 \end{array} \end{array} \end{array}$$

Fig. 3 illustrates this definition with a concrete example.

**Initialization and hiding** Note that in the simple store model (Definition 3), H is equivalently a contravariant functor <sup>H</sup> : **<sup>W</sup>**op <sup>Ñ</sup> **Set** with Hw " <sup>V</sup>w, hence <sup>H</sup> can be placed e.g. in <sup>r</sup>**W**op, **Set**s. In general, <sup>H</sup> is mix-variant, which calls for a more ingenious category where H could be placed. Designing such category is indeed the key insight of [9]. Closely following this work, we introduce a category **E**, whose objects are the same as those of **W**, and the morphisms P **E**pw, w<sup>1</sup> q, called *initializations*, consist of an injection ρ: w Ñ w<sup>1</sup> and a heaplet η P Hpw<sup>1</sup> a ρ, w<sup>1</sup> q:

$$\mathbf{E}(w, w') = \sum\_{\rho \colon w \to w'} \mathcal{H}(w' \ominus \rho, w').$$

Recall that the morphism ρ: w Ñ w<sup>1</sup> represents a move from a world with w allocated memory cells a world with w<sup>1</sup> allocated memory cells. A morphism of **E** is a morphism of **W** augmented with a heaplet part η, which provides the information how the newly allocated cells in w<sup>1</sup> a ρ are filled. The heap functor now can be viewed as a representable presheaf H : **E** Ñ **Set** essentially because by definition, Hw " <sup>H</sup>pw, wq – **<sup>E</sup>**p∅, wq. Let us agree to use the notation : <sup>w</sup> w<sup>1</sup> for morphisms in **E** to avoid confusion with the morphisms in **W**.

Like **W**, **E** supports local independent coproducts, but remarkably **E** does not have vanilla independent coproducts, due to the fact that **E** does not have an initial object. That is, in turn, because defining an inital morphism would amount to defining canonical fresh values for newly allocated cells, but those need not exist. The local independent coproducts of **W** and **E** agree in the sense that we can *promote* an initialization <sup>p</sup>ρ2, ηq: <sup>w</sup> w<sup>2</sup> along an injection ρ<sup>1</sup> : w Ñ w<sup>1</sup> to obtain an initialization <sup>ρ</sup><sup>1</sup> ' pρ2, ηq: <sup>w</sup><sup>1</sup> <sup>ρ</sup><sup>1</sup> 'w<sup>1</sup> <sup>ρ</sup>2. This is accomplished by mapping the heaplet structure <sup>η</sup> forward along <sup>ρ</sup><sup>2</sup> ' <sup>ρ</sup><sup>1</sup> : <sup>w</sup><sup>2</sup> <sup>Ñ</sup> <sup>ρ</sup><sup>1</sup> 'w <sup>ρ</sup>2.

**Hiding monad** Recall that the local store is supposed to be insensitive to garbage collection. This is captured by identifying the stores that agree on their observable parts using the *hiding monad* P defined on r**E**, **Set**s as follows:

$$(PX)w = \int^{\rho \colon w \to w' \in w \downarrow \mathfrak{u}} Xw'. \tag{2}$$

Here, u: **E** Ñ **W** is the obvious heaplet discarding functor upρ, ηq " ρ. Intuitively, in (2), we view the locations of w as public and the ones of w<sup>1</sup> a ρ as private. The integral sign denotes a *coend*, which in this case is just an ordinary colimit on **Set** and is computed as a quotient of ρ: wÑw1PwÑ<sup>u</sup> Xw<sup>1</sup> under the equivalence relation " obtained as a symmetric-transitive closure of the relation

$$\iota^\*(\rho \colon w \to w\_1, x \in Xw\_1) \preceq (\mathfrak{u} \epsilon \circ \rho \colon w \to w\_2, (X\epsilon)(x) \in Xw\_2) \qquad (\epsilon \colon w\_1 \leadsto w\_2)$$

Note that is a preorder. Moreover, it enjoys the following *diamond property*.

**Proposition 5.** *If* pρ, xq pρ1, x1q *and* pρ, xq pρ2, x2q *then* pρ1, x1q pρ<sup>1</sup> , x<sup>1</sup> q *and* pρ2, x2q pρ<sup>1</sup> , x<sup>1</sup> q *for a suitable* pρ<sup>1</sup> , x<sup>1</sup> q*. Hence* pρ1, x1q"pρ2, x2q *iff* pρ1, x1q pρ, xq*,* pρ2, x2q pρ, xq *for some* pρ, xq*.*

**Example 6.** To illustrate the equivalence relation " behind P, we revisit the setting of Example 4. Consider the following situations:

Here, the solid lines indicate public locations and the dotted lines indicate private locations. The left equivalence holds because the private locations are not reachable from the public ones by references (depicted as arrows). On the right, although the public parts are equal, the reachable cells of the private parts reveal the distinction, preventing the equivalence under ". Intuitively, hiding identifies those heaps that agree both on their public and reachable private part.

The covariant action of P X (on **E**) is defined via promotion of initializations:

$$\begin{aligned} (PX)(\epsilon \colon w\_1 \leadsto w\_2)(\rho \colon w\_1 \to w'\_1, x \in Xw'\_1) \smile \\ = (\mathsf{u}\epsilon \bullet \rho \colon w\_2 \to \rho \oplus\_{w\_1} \mathsf{u}\epsilon, X(\rho \bullet \epsilon)(x))\_{\sim \sim \epsilon} \end{aligned}$$

Furthermore, there is a contravariant hiding operation (on **W**) given by the canonical action of the coend: for ρ: w Ñ w<sup>1</sup> , we define hideρ : PXw<sup>1</sup> <sup>Ñ</sup> PXw:

$$\text{hide}\_{\rho}(\rho' \colon w' \to w'', x \in Xw'')\_{\sim} = (\rho' \circ \rho, x)\_{\sim} \tag{3}$$

This allows us to regard P both as a functor r**E**, **Set**sÑr**E**, **Set**s and as a functor r**E**, **Set**sÑr**W**op, **Set**s.

**Full ground store monad** We now have all the necessary ingredients to obtain the full ground store monad T on r**W**, **Set**s. This monad is assembled by composing the functors in Fig. 1 in the following way. First, observe that pPp-- ˆHqq<sup>H</sup> is a standard (global) store monad transform of P on r**E**, **Set**s. This monad is sandwiched between the adjunction u‹ \$ u‹ induced by u (see Section 2). Since any monad itself resolves into an adjunction, sandwiching in it between an adjunction again yields a monad. In summary,

$$T = \left( \left[ \mathbf{W}, \mathbf{Set} \right] \quad \xrightarrow{\mathbf{u}^\*} \left[ \mathbf{E}, \mathbf{Set} \right] \xrightarrow{P(-\times \ H)^H} \left[ \mathbf{E}, \mathbf{Set} \right] \xrightarrow{\mathbf{u}^\*} \left[ \mathbf{W}, \mathbf{Set} \right] \right). \tag{4}$$

**Theorem 7.** *The monad* T*, defined by* (4) *is strong.*

*Proof.* The proof is a straightforward generalization of the proof in [9]. [\

We can recover the monad previously developed by Plotkin and Power [16] by resorting to the simple store (Example 3).

**Theorem 8.** *Under the simple store model* T *is isomorphic to the local store monad from [16]:*

$$(TX)w \cong \left(\int^{\rho \colon w \to w' \in w \downarrow \mathbf{W}} Xw' \times \mathcal{V}^{w'}\right)^{\mathcal{V}^w}.$$

Using (4), one obtains the requisite semantics to the language in Fig. 2 using the standard clauses of fine-grain call-by-value [11], except for the special clauses for **(put)**, **(get)** and **(new)**, which require special operations of the monad:

$$\begin{aligned} &\mathbf{get}\colon\mathbf{u}^{\star}\underline{\mathbf{Ref}\_{S}}\times H\to\mathbf{u}^{\star}\underline{\mathbf{CType}(S)}\times H\\ &\mathbf{put}\colon(\mathbf{u}^{\star}\underline{\mathbf{Ref}\_{S}}\times\mathbf{u}^{\star}\underline{\mathbf{CType}(S)})\times H\to 1\times H\\ &\mathbf{new}\colon\mathbf{u}^{\star}(\underline{\mathbf{CType}(S)}\underline{\mathbf{Ref}\_{S}})\times H\to P(\mathbf{u}^{\star}\underline{\mathbf{Ref}\_{S}}\times H) \end{aligned}$$

## **5 Intermezzo: BI-Hyperdoctrines and BI-Algebras**

To be able to give a categorical notion of higher order logic over local store, following Biering et al [2], we aim to construct a *BI-hyperdoctrine*.

Note that algebraic structures, such as monoids and Heyting algebras can be straightforwardly internalized in any category with finite products, which gives rise to *internal monoids*, *internal Heyting algebras*, etc. The situation changes when considering non-algebraic properties. In particular, recall that a Heyting algebra A is *complete* iff it has arbitrary joins, which are preserved by binary meets. The corresponding categorical notion is essentially obtained from spelling out generic definitions from internal category theory [6, B2] and is as follows.

**Definition 9 (Internally Complete Heyting Algebras).** An internal Heyting (Boolean) algebra A in a finitely complete category **C** is *internally complete* if for every f P **C**pI,Jq, there exist *indexed joins* f : **<sup>C</sup>**pI,Aq Ñ **<sup>C</sup>**pJ, Aq, left order-adjoint to p--q ˝ f : **C**pJ, Aq Ñ **C**pI,Aq such that for any pullback square on the left, the corresponding diagram on the right commutes (*Beck-Chevalley condition*):

$$\begin{array}{ccc} I & \xrightarrow{f} & \mathbf{C}(J,A) & \xrightarrow{\{\text{-}\}\circ f} & \mathbf{C}(I,A) \\ \downarrow I & & \downarrow\_{h} & & \downarrow\_{s} \\ I' & \xrightarrow{f'} & J' & & \mathbf{C}(J',A) & \xrightarrow{\{\text{-}\}\circ f'} & \mathbf{C}(I',A) \end{array}$$

It follows generally that existence of indexed joins implies existence of indexed meets , which then satisfy dual conditions ([6, Corollary 2.4.8]).

**Remark 10 (Binary Joins/Meets).** The adjointness condition for indexed joins means precisely that f <sup>φ</sup> <sup>ď</sup> <sup>ψ</sup> iff <sup>φ</sup> <sup>ď</sup> <sup>ψ</sup> ˝ <sup>f</sup> for every <sup>φ</sup>: <sup>I</sup> <sup>Ñ</sup> <sup>A</sup> and every ψ: J Ñ A. If **C** has binary coproducts, by taking f " ∇: X ` X Ñ X we obtain that <sup>∇</sup> <sup>φ</sup> <sup>ď</sup> <sup>ψ</sup> iff <sup>φ</sup> ď rψ,ψ<sup>s</sup> iff <sup>φ</sup> ˝ inl <sup>ď</sup> <sup>ψ</sup> and <sup>φ</sup> ˝ inr <sup>ď</sup> <sup>ψ</sup>. This characterizes <sup>∇</sup>rφ1, φ2s: <sup>X</sup> <sup>Ñ</sup> <sup>A</sup> as the binary join of <sup>φ</sup>1, φ<sup>2</sup> : <sup>X</sup> <sup>Ñ</sup> <sup>A</sup>. Binary meets are characterized analogously.

**Definition 11 ((First Order) (BI-)Hyperdoctrine).** Let **C** be a category with finite products. A *first order hyperdoctrine over* **C** is a functor S : **C**op Ñ **Poset** with the following properties:


If additionally

$$\begin{array}{llll} \begin{array}{l} \Gamma \vdash \upsilon \colon A \quad \Gamma \vdash \phi \colon \mathsf{PA} \\ \hline \Gamma \vdash \phi(\upsilon) \colon \mathsf{prop} \end{array} & \begin{array}{l} \Gamma \mathrel{\vbox{\hbox{\$}} A \mathrel{\vbox{\hbox{\$}} A \vdash} \mathsf{A} \vdash \phi \colon \mathsf{prop} \\ \Gamma \vdash \mathsf{x} \,\, \phi \colon \mathsf{PA} \end{array} & \begin{array}{l} \Gamma \vdash \upsilon \colon \mathsf{A} \; \mathsf{Ref}\_{S} \quad \Gamma \vdash \mathsf{\forall\,\upsilon \colon \mathsf{C}} \, \mathsf{Type}(S) \\ \hline \Gamma \vdash \mathsf{\forall\,\upsilon \colon \mathsf{P} \, \mathsf{A}} \end{array} \\\\ \begin{array}{l} \Gamma \vdash \phi \colon \mathsf{PA} \\ \Gamma \vdash Q \, \phi \colon \mathsf{prop} \end{array} & \begin{array}{l} \Gamma \vdash \upsilon \colon A \quad \Gamma \vdash \upsilon \colon A \quad \Gamma \vdash \mathsf{\forall\,\upsilon \colon A} \\ \hline \Gamma \vdash \upsilon = w \colon \mathsf{prop} \end{array} \\\\ \begin{array}{l} \Gamma \vdash \phi \colon \mathsf{prop} \quad \Gamma \vdash \psi \colon \mathsf{prop} \\ \Gamma \vdash \phi \\$ \forall \psi \colon \mathsf{prop} \end{array} & \begin{array}{l} \Gamma \vdash \langle \lambda \, \mathsf{:} \mathsf{op} \, \ \mathsf{p} \quad \Gamma \vdash \mathsf{p} \, \mathsf{:} \mathsf{op} \end{array} \end{array}$$

Fig. 4: Term formation rules for the higher order separation logic.


then S is called a *first order BI-hyperdoctrine*.

In a *(higher order) hyperdoctrine*, **C** is additionally required to be Cartesian closed and every SX is required to be poset-isomorphic to **C**pX, Aq for a suitable internal Heyting algebra A P |**C**| naturally in X. Such a hyperdoctrine is a *BI-hyperdoctrine* if moreover A is an internal BI-algebra.

**Proposition 12.** *Every internally complete Heyting algebra* A *in a Cartesian closed category* **C** *with finite limits gives rise to a canonical hyperdoctrine* **C**p--, Aq*: for every* X*,* **C**pX, Aq *is a poset under* f ď g *iff* f ^ g " f*.*

*Proof.* Clearly, every **C**pX, Aq is a Heyting algebra and every **C**pf,Aq is a Heyting algebra morphism. The quantifies are defined mutually dually as follows:

$$(\exists Y)\_X(\phi \colon X \times Y \to A) = \bigvee\_{\text{fst} \colon \, X \times Y \to X} \phi,$$

$$(\forall Y)\_X(\phi \colon X \times Y \to A) = \bigwedge\_{\text{fst} \colon \, X \times Y \to X} \phi.$$

Naturality in X follows from the corresponding Beck-Chevalley conditions.

Finally, internal equality "X : <sup>X</sup> <sup>ˆ</sup> <sup>X</sup> <sup>Ñ</sup> <sup>A</sup> is defined as idX,idX <sup>J</sup>. [\

A standard way to obtain an (internally) complete BI-algebra is to resort to ordered partial commutative monoids [18].

**Definition 13 (Ordered PCM [18]).** An *ordered partial commutative monoid (pcm)* is a tuple pM, E, ¨ , ďq where M is a set, E Ď M is a set of *units*, *multiplication* ¨ is a partial binary operation on M, and ď is a preorder on M, satisfying an number of axioms (see [18] for details).

We note that using general recipes [3], for every internal ordered pcm M in a topos **C** with subobject classifier Ω, **C**p-- ˆM,Ωq forms a BI-hyperdoctrine, on particular, if **C** " **Set** then **Set**p-- ˆM, 2q is a BI-hyperdoctrine.

## **6 A Higher Order Logic for Full Ground Store**

We proceed to develop a local version of separation logic using semantic principles explored in the previous sections. That is, we seek an interpretation for the language in Fig. 4 in the category r**W**, **Set**s over the type system (1), extended with *predicate types* PA. The judgements Γ \$ φ: prop type formulas depending on a variable context Γ. Additionally, we have judgements of the form Γ \$ φ: PA for *predicates in context*. Both kinds of judgements are mutually convertible using the standard application-abstraction routine. Note that expressions for quantifiers Dx. φ are thus obtained in two steps: by forming a predicate x. φ, and subsequently applying D. Apart from the standard logical connectives, we postulate *separating conjunction* ‹ and *separating implication* ´‹.

Our goal is to build a BI-hyperdoctrine, using the recipes, summarized in the previous section. That is, we construct a certain internal BI-algebra Θ in r**W**, **Set**s, and subsequently conclude that r--, Θs is a BI-hyperdoctrine in question. In what follows, most of the effort is invested into constructing an internally complete Boolean algebra <sup>P</sup><sup>ˇ</sup> ˝ pPˆH<sup>ˆ</sup> <sup>q</sup> (hence <sup>r</sup>--,P<sup>ˇ</sup> ˝ pPˆH<sup>ˆ</sup> qs is a hyperdoctrine), from which Θ is carved out as a subfunctor, identified by an upward closure condition. Here, <sup>P</sup><sup>ˇ</sup> is a contravariant powerset functor, and <sup>P</sup><sup>ˆ</sup> and <sup>H</sup><sup>ˆ</sup> are certain modifications of the hiding and the heap functors from Section 4. As we shall see, the move from <sup>P</sup><sup>ˇ</sup> ˝ pPˆH<sup>ˆ</sup> <sup>q</sup> to <sup>Θ</sup> remedies the problem of the former that the natural separation conjunction operator ‹ on it does not have unit (Remark 19).

In order to model resource separation, we must identify a domain of logical assertions over partial heaps, i.e. heaplets, instead of total heaps. We thus need to derive a unary (covariant) heaplet functor from the binary, mix-variant one H used before. We must still cope not only with heaplets, but with partially hidden heaplets, to model information hiding. A seemingly natural candidate functor for hidden heaplets is the composition

$$P\left(\mathbf{E} \mid \xrightarrow{\sum\_{w \subseteq -} \mathcal{H}(w, -)} \mathbf{Set}\right) \colon \mathbf{W}^{\textup{op}} \to \mathbf{Set}.$$

One problem of this definition is that the equivalence relation " underlying the w1Ďw <sup>H</sup>pw<sup>1</sup> , wq. Then <sup>p</sup>id: <sup>w</sup> <sup>Ñ</sup> w, ewqjpinl: <sup>w</sup> <sup>Ñ</sup> <sup>w</sup> ' {‹: <sup>1</sup>}, ew'{‹: 1}q, i.e. two hidden heaplets would not be equivalent if one extends the other by an inaccessible hidden cell. In order to arrive at a more reasonable model of logical assertions, we modify the previous model by replacing the category of initializations **E** is a category **E**ˆ of *partial initializations*. This will induce a hiding monad <sup>P</sup><sup>ˆ</sup> over <sup>r</sup>**E**<sup>ˆ</sup> , **Set**<sup>s</sup> using exactly the same formula (2) as for <sup>P</sup>. construction of <sup>P</sup> (2) is too fine. Consider, for example, <sup>e</sup><sup>w</sup> " p<sup>∅</sup> <sup>Ď</sup> w, ‹q P

A partial initialization is a pair pρ, ηq with ρ P **W**pw´ <sup>1</sup> , w` <sup>2</sup> q and η P w´Ďw` <sup>2</sup> <sup>a</sup><sup>ρ</sup> <sup>H</sup>pw´, w` <sup>2</sup> <sup>q</sup>. Let **<sup>E</sup>**<sup>ˆ</sup> be the category of heap layouts and partial initializations. Analogously to u, there is an obvious partial-heap-forgetting functor <sup>u</sup>ˆ: **<sup>E</sup>**<sup>ˆ</sup> <sup>Ñ</sup> **<sup>W</sup>**. Let <sup>H</sup><sup>ˆ</sup> : **<sup>E</sup>**<sup>ˆ</sup> <sup>Ñ</sup> **Set** be the following *heaplet functor* :

$$
\hat{H}w = \sum\_{w' \subseteq w} \mathcal{H}(w', w).
$$

Given a partial initialization " pρ: w Ñ w<sup>1</sup> ,pw<sup>2</sup> Ď w<sup>1</sup>aρ, η P Hpw2, w<sup>1</sup> qqq: <sup>w</sup> - w1 , H <sup>ˆ</sup> : Hw<sup>ˆ</sup> <sup>Ñ</sup> Hw<sup>ˆ</sup> <sup>1</sup> extends a given heaplet over <sup>w</sup> to a heaplet over <sup>w</sup><sup>1</sup> via <sup>η</sup>:

$$(\hat{H}\epsilon)(w\_1 \subseteq w, \eta' \in \mathcal{H}(w\_1, w)) = (\rho[w\_1] \cup w'' \subseteq w', \eta'')$$

where η<sup>2</sup> P Hpρrw1s Y w<sup>2</sup> Ď w<sup>1</sup> , w<sup>1</sup> q is as follows

$$\mathfrak{pr}\_{\rho(\ell \colon S)} \eta'' = \mathfrak{range}(S)(\rho)(\mathfrak{pr}\_{(\ell \colon S)} \eta') \tag{ (\ell \colon S) \in w\_1}$$

$$\mathfrak{pr}\_{\left(\ell\colon S\right)} \eta'' = \mathfrak{pr}\_{\left(\ell\colon S\right)} \eta \tag{ (\ell\colon S) \in w'' })$$

With **E**ˆ and Hˆ as above instead of **E** and H, the framework described in Section 4 transforms coherently.

**Remark 14.** Let us fix a fresh symbol , and note that

$$\hat{H}w = \sum\_{w' \subseteq w} \prod\_{(\ell \colon S) \in w'} \mathsf{range}(S)(w) \cong \prod\_{(\ell \colon S) \in w} (\mathsf{range}(S)(w) \uplus \{\boxtimes\}),$$

meaning that the passage from **E**, H and P to **E**ˆ , Hˆ and Pˆ is equivalent to extending the range function with designated values for *inaccessible locations*. We prefer to think of this way and not as a content of *dangling pointers*, to emphasize that we deal with a *reasoning phenomenon* and not with a *programming phenomenon*, for our programs neither create nor process dangling pointers.

For the next proposition we need the following concrete description of the set uˆ‹p2Xqw as the end ρ: wÑw1PwÑ<sup>u</sup><sup>ˆ</sup> **Set**pXw<sup>1</sup> , 2q: this set is a space of dependent functions φ sending every injection ρ: w Ñ w<sup>1</sup> to a corresponding subset of Xw<sup>1</sup> , and satisfying the constraint: <sup>x</sup> <sup>P</sup> <sup>φ</sup>pρ<sup>q</sup> iff <sup>p</sup>X qpxq P <sup>φ</sup>pu<sup>ˆ</sup> ˝ρ<sup>q</sup> for every : <sup>w</sup><sup>1</sup> w2.

**Proposition 15.** *The following diagram commutes up to isomorphism:*

*(using the fact that* <sup>r</sup>**W**, **Set**op<sup>s</sup> op – r**W**op, **Set**s*) where* <sup>P</sup><sup>ˇ</sup> *is the contravariant powerset functor* <sup>P</sup><sup>ˇ</sup> : **Set**op <sup>Ñ</sup> **Set** *and for every* <sup>X</sup> : **<sup>E</sup>**<sup>ˆ</sup> <sup>Ñ</sup> **Set** *the relevant isomorphism* <sup>Φ</sup>w : <sup>u</sup>ˆ‹p2Xq<sup>w</sup> – <sup>P</sup>ˇpPXw <sup>ˆ</sup> <sup>q</sup> *is as follows:*

$$\Phi(\rho \colon w \to w', x \in Xw')\_{\sim} \in \Phi\_w(\phi \in \mathring{\mathfrak{u}}\_\star(2^X)w) \iff x \in \phi(\rho). \tag{5}$$

Let us clarify the significance of Proposition 15. The exponential 2H<sup>ˆ</sup> in <sup>r</sup>**E**<sup>ˆ</sup> , **Set**<sup>s</sup> can be thought of as a carrier of Boolean predicates over Hˆ , and as we see next those form an internally complete Boolean algebra, which is carried from <sup>r</sup>**E**<sup>ˆ</sup> , **Set**<sup>s</sup> to <sup>r</sup>**W**, **Set**<sup>s</sup> by <sup>u</sup>ˆ‹. The alternative route via <sup>P</sup><sup>ˆ</sup> and <sup>P</sup><sup>ˇ</sup> induces a Boolean algebra of predicates over hidden heaplets <sup>P</sup>ˆH<sup>ˆ</sup> directly in <sup>r</sup>**W**, **Set**s. The equivalence established in Proposition 15 witnesses agreement of these two structures.

**Theorem 16.** *For every* <sup>X</sup> : **<sup>E</sup>**<sup>ˆ</sup> <sup>Ñ</sup> **Set***,* <sup>P</sup><sup>ˇ</sup> ˝ pP X<sup>ˆ</sup> <sup>q</sup> *is an internally complete Boolean algebra in* r**W**, **Set**s *under*

$$\begin{split} \left(\bigvee\_{f} \phi \colon I \to \check{P} \circ (\hat{P}X)\right)\_{w} (j \in Jw) \\ = & \{ (\rho \colon w \to w', x \in Xw')\_{\sim} \mid \exists \, \epsilon \colon w' \leadsto w'', \exists i \in Iw''. \\ \qquad f\_{w''}(i) = J(\hat{\mathsf{u}} \epsilon \circ \rho)(j) \wedge (\mathsf{id}\_{w''}, (X \,\epsilon)(x))\_{\sim} \in \phi\_{w''}(i)\}, \\ \left(\bigwedge\_{f} \phi \colon I \to \check{P} \circ (\hat{P}X)\right)\_{w} (j \in Jw) \\ = & \{ (\rho \colon w \to w', x \in Xw')\_{\sim} \mid \forall \, \epsilon \colon w' \leadsto w'', \forall i \in Iw''. \\ \qquad f\_{w''}(i) = J(\hat{\mathsf{u}} \epsilon \circ \rho)(j) \Rightarrow (\mathsf{id}\_{w''}, (X \,\epsilon)(x))\_{\sim} \in \phi\_{w''}(i)\}. \end{split}$$

*for every* f : I Ñ J*, and the corresponding Boolean algebra operations are computed as set-theoretic unions, intersections and complements.*

By Theorem 16, we obtain a hyperdoctrine <sup>r</sup>--,P<sup>ˇ</sup> ˝ pPˆH<sup>ˆ</sup> qs, which provides us with a model of (classical) higher order logic in r**W**, **Set**s. In particular, this allows us to interpret the language from Fig. 4 over r**W**, **Set**s excluding the separation logic constructs, in such a way that

$$\{\Gamma \vdash \phi \colon \text{prop} \} \colon \underline{\Gamma} \to \check{\mathcal{P}} \circ (\hat{P}\hat{H}), \qquad \{\Gamma \vdash \phi \colon \mathbb{P}A\} \colon \underline{\Gamma} \times \underline{A} \to \check{\mathcal{P}} \circ (\hat{P}\hat{H})\}$$

where <sup>Γ</sup> " <sup>A</sup><sup>1</sup> <sup>ˆ</sup>...ˆAn for <sup>Γ</sup> " px<sup>1</sup> : <sup>A</sup>1,...,xn : <sup>A</sup>n<sup>q</sup> where, additionally to the standard clauses, <sup>P</sup><sup>A</sup> " <sup>P</sup><sup>ˇ</sup> ˝ <sup>P</sup>ˆpu‹<sup>A</sup> <sup>ˆ</sup> <sup>H</sup><sup>ˆ</sup> <sup>q</sup>. The latter interpretation of predicate types PA is justified by the natural isomorphism:

$$(\check{\mathcal{P}} \circ (\hat{P}\hat{H}))^X \cong (\hat{\mathfrak{u}}\_\star(2^{\hat{H}}))^X \cong \hat{\mathfrak{u}}\_\star((2^{\hat{H}})^{\hat{\mathfrak{u}}^\star X}) \cong \check{\mathcal{P}} \circ (\hat{P}(\hat{\mathfrak{u}}^\star X \times \hat{H})) .$$

Here, the first and the last transitions are by Φ from Proposition 15 and the middle one is due to the fact that clearly both puˆ‹p--qq<sup>X</sup> \$ uˆ‹pX ˆ p--qq and <sup>u</sup>ˆ‹pp--q<sup>u</sup>ˆ‹Xq \$ <sup>u</sup>ˆ‹p<sup>X</sup> ˆ p--qq.

Since every set Hwˆ models a heaplet in the standard sense [18], we can equip Hwˆ with a standard pointer model structure.

**Proposition 17.** *For every* <sup>w</sup> P |**W**|*,* <sup>p</sup>Hw, <sup>ˆ</sup> {p<sup>∅</sup> <sup>Ď</sup> w, ‹q}, ¨ , ďq *is an ordered pcm where for every* <sup>w</sup> P |**W**|*,* Hw<sup>ˆ</sup> *is partially ordered as follows:*

$$\left( (w\_1 \subseteq w, \mathcal{H}(w\_1 \subseteq w\_2, w)\eta \in \mathcal{H}(w\_1, w) \right) \leqslant \left( w\_2 \subseteq w, \eta \in \mathcal{H}(w\_2, w) \right) \quad \left( w\_1 \subseteq w\_2 \right)$$

*and for* w<sup>1</sup> Ď w*,* w<sup>2</sup> Ď w *and* η<sup>1</sup> P Hpw1, wq*,* η<sup>2</sup> P Hpw2, wq*,* pw<sup>1</sup> Ď w, η1q¨pw<sup>2</sup> Ď w, η2q *equals* pw<sup>1</sup> Y w2, η<sup>1</sup> Y η2q *if* w<sup>1</sup> X w<sup>2</sup> " ∅*, and otherwise undefined.*

As indicated in Section 5, we automatically obtain a BI-algebra structure over the set of all subsets of Hwˆ . The same strategy does not apply to PˆHwˆ , roughly because we cannot predict mutual arrangement of hidden partitions of two heaplets wrt to each other, for we do not have a global reference space for

pointers as contrasted to the standard separation logic setting. We thus define a separating conjunction operator directly on every <sup>P</sup>ˇpPˆHw<sup>ˆ</sup> <sup>q</sup> as follows:

$$\begin{aligned} \phi \star\_w \psi = \{ (\rho \colon w \to w', (w\_1 \\*\*\lor w\_2 \subseteq w', \eta \in \mathcal{H}(w\_1 \\*\*\lor w\_2, w')))\_{\sim} \mid \\ (\rho, (w\_1 \subseteq w', \mathcal{H}(w\_1 \subseteq w\_1 \not\lor w\_2, w')\eta))\_{\sim} \in \phi, \\ (\rho, (w\_2 \subseteq w', \mathcal{H}(w\_2 \subseteq w\_1 \not\lor w\_2, w')\eta))\_{\sim} \in \psi\}. \end{aligned}$$

**Lemma 18.** *The operator* ‹w *on* <sup>P</sup>ˇpPˆHw<sup>ˆ</sup> <sup>q</sup> *satisfies the following properties.*


Property (3) specifically tells us that any representative of an equivalence class contained in a separating conjunction can be split in such a way that the respective pieces belong to the arguments of the separating conjunction.

**Remark 19.** The only candidate for the unit of the separating conjunction ‹w would be the emptiness predicate emptyw : <sup>1</sup> <sup>Ñ</sup> <sup>P</sup>ˇpPˆHw<sup>ˆ</sup> <sup>q</sup>, identifying precisely the empty heaplets. However, emptyw is not natural in <sup>w</sup>. In fact, it follows by Yoneda lemma that there are exactly two natural transformations 1 <sup>Ñ</sup> <sup>P</sup><sup>ˇ</sup> ˝ pPˆH<sup>ˆ</sup> <sup>q</sup>, which are the total truth and the total false, none of which is a unit for ‹w.

Remark 19 provides a formal argument why we cannot interpret classical separation logic over <sup>P</sup><sup>ˇ</sup> ˝ pPˆH<sup>ˆ</sup> <sup>q</sup>. We thus proceed to identify for every <sup>w</sup> a subset of <sup>P</sup>ˇpPˆHw<sup>ˆ</sup> <sup>q</sup>, for which the total truth predicate becomes the unit of the separating conjunction. Concretely, let <sup>Θ</sup> be the subfunctor of <sup>P</sup><sup>ˇ</sup> ˝ pPˆH<sup>ˆ</sup> <sup>q</sup> identified by the following *upward closure condition:* φ P Θw if

$$(\rho, \eta)\_{\sim} \in \phi, \ \eta \lessapprox \eta' \qquad \text{imply} \qquad (\rho, \eta')\_{\sim} \in \phi.$$

**Lemma 20.** <sup>Θ</sup> *is an internal complete sublattice of* <sup>P</sup><sup>ˇ</sup> ˝ pPˆH<sup>ˆ</sup> <sup>q</sup>*, i.e. the inclusion* <sup>ι</sup>: Θ <sup>Ñ</sup> <sup>P</sup><sup>ˇ</sup> ˝ pPˆH<sup>ˆ</sup> <sup>q</sup> *preserves all meets and all joins. This canonically equips* <sup>Θ</sup> *with an internally complete Heyting algebra structure.*

*Proof (Sketch).* The key idea is to establish a retraction pι, clq with cl ˝ι " id. The requisite structure is then transferred from <sup>P</sup><sup>ˇ</sup> ˝ pPˆH<sup>ˆ</sup> <sup>q</sup> to <sup>Θ</sup> along it. The Heyting implication for Θ is obtained using the standard formula pφ ñ ψq " {ξ | φ ^ ξ ď ψ} interpreted in the internal language. [\

**Lemma 21.** *Separating conjunction preserves upward closure: for* φ, ψ P Θw*,* <sup>φ</sup> ‹w <sup>ψ</sup> " clwp<sup>φ</sup> ‹w <sup>ψ</sup>q*.*

**Lemma 22.** <sup>Θ</sup> *is a BI-algebra:* ‹w *is obtained by restriction from* <sup>P</sup>ˇpPˆHw<sup>ˆ</sup> <sup>q</sup> *by Lemma 21,* PˆHwˆ *is the unit for it and*

$$\begin{aligned} \left(\phi \prec\_w \psi = \{ (\rho, \eta)\_{\sim} \in \Theta w \mid \forall \rho' \colon w \to w', \eta\_1, \eta\_2 \in \hat{H}w', \eta\_1 \cdot \eta\_2 \text{ defined } \wedge \\ (\rho, \eta) \sim (\rho', \eta\_1) \land (\rho', \eta\_2)\_{\sim} \in \phi \Rightarrow (\rho', \eta\_1 \cdot \eta\_2)\_{\sim} \in \psi \}. \end{aligned}$$

**–** s, ρ, η |ù J **–** s, ρ, η |ù <sup>φ</sup> ^ <sup>ψ</sup> if s, ρ, η |ù <sup>φ</sup> and s, ρ, η |ù <sup>ψ</sup> **–** s, ρ, η |ù <sup>φ</sup> \_ <sup>ψ</sup> if s, ρ, η |ù <sup>φ</sup> or s, ρ, η |ù <sup>ψ</sup> **–** s, ρ, η |ù <sup>φ</sup> <sup>ñ</sup> <sup>ψ</sup> if for all <sup>p</sup>ρ, ηq"pρ<sup>1</sup> , η<sup>1</sup> q and η<sup>1</sup> ď η<sup>2</sup>, s, ρ<sup>1</sup> , η<sup>2</sup> |ù φ implies s, ρ<sup>1</sup> , η<sup>2</sup> |ù ψ **–** s, ρ, η |ù <sup>φ</sup>pv<sup>q</sup> if s, ρ, pp-<sup>Γ</sup> \$<sup>v</sup> <sup>v</sup> : <sup>A</sup><sup>w</sup><sup>1</sup> ˝ Γρqs, ηq |ù <sup>φ</sup> **–** s, ρ, <sup>p</sup>a, ηq |ù x. φ if <sup>a</sup> " pXρq<sup>b</sup> and <sup>p</sup>s, bq, ρ, η |ù <sup>φ</sup> **–** s, ρ, η |ù <sup>Ñ</sup> <sup>v</sup> if <sup>η</sup> " pw<sup>2</sup> <sup>Ď</sup> <sup>w</sup><sup>1</sup> , δ P Hpw<sup>2</sup>, w<sup>1</sup> qq and <sup>δ</sup>p<sup>r</sup> : <sup>S</sup>q"p-<sup>Γ</sup> \$<sup>v</sup> <sup>v</sup> : CTypepSq<sup>w</sup><sup>1</sup> ˝ Γρq<sup>s</sup> where <sup>p</sup>-<sup>Γ</sup> \$<sup>v</sup> : Ref<sup>S</sup><sup>w</sup><sup>1</sup> ˝ Γρq<sup>s</sup> " p<sup>r</sup> : <sup>S</sup>q P <sup>w</sup><sup>2</sup> **–** s, ρ, η |ù <sup>v</sup> " <sup>u</sup> if <sup>p</sup>-<sup>Γ</sup> \$<sup>v</sup> <sup>v</sup> : <sup>A</sup><sup>w</sup><sup>2</sup> ˝ Γρ<sup>1</sup> ˝ Γρqpsq"p-<sup>Γ</sup> \$<sup>v</sup> <sup>u</sup>: <sup>A</sup><sup>w</sup><sup>2</sup> ˝ Γρ<sup>1</sup> ˝ Γρqps<sup>q</sup> for some ρ<sup>1</sup> : w<sup>1</sup> Ñ w<sup>2</sup> **–** s, ρ, η |ù <sup>φ</sup> ‹ <sup>ψ</sup> if for suitable <sup>w</sup>1, <sup>w</sup>2, <sup>η</sup> <sup>P</sup> <sup>H</sup>pw<sup>1</sup> <sup>Z</sup> <sup>w</sup>2, w<sup>1</sup> q, s, ρ, pw<sup>1</sup> Ď w<sup>1</sup> , Hpw<sup>1</sup> Ď w<sup>1</sup> Z w2, w<sup>1</sup> qηq |ù φ and s, ρ, pw<sup>2</sup> Ď w<sup>1</sup> , Hpw<sup>2</sup> Ď w<sup>1</sup> Z w2, w<sup>1</sup> qηq |ù ψ **–** s, ρ, η |ù <sup>φ</sup> ´‹ <sup>ψ</sup> if for all <sup>p</sup>ρ<sup>1</sup> , η1q"pρ, ηq and for all η<sup>2</sup> such that η<sup>1</sup> ¨ η<sup>2</sup> is defined, s, ρ<sup>1</sup> , η<sup>2</sup> |ù φ implies s, ρ<sup>1</sup> , η<sup>1</sup> ¨ η<sup>2</sup> |ù ψ **–** s, ρ, η |ù D<sup>φ</sup> if <sup>Γ</sup>pu<sup>ˆ</sup> ˝ ρqs, id<sup>w</sup><sup>2</sup> , pa, H <sup>ˆ</sup> ˝ <sup>η</sup>q |ù <sup>φ</sup> for some : w<sup>1</sup> w<sup>2</sup>, a P Aw<sup>2</sup> **–** s, ρ, η |ù @<sup>φ</sup> if <sup>Γ</sup>pu<sup>ˆ</sup> ˝ ρqs, id<sup>w</sup><sup>2</sup> , pa, H <sup>ˆ</sup> ˝ <sup>η</sup>q |ù <sup>φ</sup> for all : w<sup>1</sup> w<sup>2</sup>, a P Aw<sup>2</sup>

Fig. 5: Semantics of the logic.

*Proof.* In view of Lemma 20, we are left to show that the given operations are natural and that Θ is an internal BI-algebra w.r.t. them. Since BI-algebras form a variety [5], it suffices to show that each Θw is a BI-algebra. By Lemma 18 (ii), it suffices to show that every <sup>p</sup>--q ‹w <sup>φ</sup> preserves arbitrary joins, for then we can use the standard formula to calculate <sup>φ</sup> ´‹w <sup>ψ</sup>, which happens to be natural in <sup>w</sup>:

$$\left| \phi \dashrightarrow w \right. \left. \psi = \bigcup \{ \xi \mid \phi \star\_w \xi \leqslant \psi \} . \right.$$

By unfolding the right-hand side, we obtain the expression for ´‹w figuring in the statement of the lemma. [\

**Theorem 23.** Θ *is an internally complete Heyting BI-algebra, hence* r--, Θs *is a* BI*-hyperdoctrine.*

*Proof.* Follows from Lemmas 20 and 22. [\

This now provides us with a complete semantics of the language in Fig. 4 with -<sup>Γ</sup> \$ <sup>φ</sup>: prop : <sup>Γ</sup> <sup>Ñ</sup> <sup>Θ</sup> and -<sup>Γ</sup> \$ <sup>φ</sup>: <sup>P</sup>A : <sup>Γ</sup> <sup>Ñ</sup> <sup>P</sup><sup>A</sup> where <sup>P</sup><sup>A</sup> is the upward closed subfunctor of <sup>P</sup><sup>ˇ</sup> ˝ pPˆpuˆ<sup>A</sup> <sup>ˆ</sup> <sup>H</sup><sup>ˆ</sup> qq, with upward closure only on the <sup>H</sup><sup>ˆ</sup> -part, which is isomorphic to ΘA. The resulting semantics is defined in Fig. 5 where we write s, ρ, η |ù <sup>φ</sup> for <sup>p</sup>ρ, ηq" <sup>P</sup> -<sup>Γ</sup> \$ <sup>φ</sup>: propps<sup>q</sup> and s, ρ,pa, ηq |ù <sup>φ</sup> for <sup>p</sup>ρ,pa, ηqq" <sup>P</sup> -<sup>Γ</sup> \$ <sup>φ</sup>: <sup>P</sup>Apsq. The following properties [4] are then automatic.

**Proposition 24. –** (Monotonicity) *If* s, ρ, η |ù φ *and* η ď η<sup>1</sup> *then* s, ρ, η<sup>1</sup> |ù φ*.* **–** (Shrinkage) *If* s, ρ, η |ù φ*,* η<sup>1</sup> ď η *and* η<sup>1</sup> *contains all cells reachable from* s *and* w *then* s, ρ, η<sup>1</sup> |ù φ*.*

## **7 Examples**

Let us illustrate subtle features of our semantics by some examples.

**Example 25.** Consider the formula D- : RefInt .-Ñ 5 from the introduction in the empty context --. Then --, ρ, η |ù D-. - <sup>Ñ</sup> 5 iff for some : <sup>w</sup><sup>1</sup> w2, and some <sup>x</sup> <sup>P</sup> RefIntw2, x, idw<sup>2</sup> ,pH <sup>ˆ</sup> <sup>q</sup><sup>η</sup> |ù -<sup>1</sup> <sup>Ñ</sup> 5. The latter is true iff prxppH <sup>ˆ</sup> <sup>q</sup>ηq " 5. Note that w<sup>1</sup> may not contain and it is always possible to choose so that w<sup>2</sup> contains and prxppH <sup>ˆ</sup> <sup>q</sup>ηq " 5. Hence, the original formula is always valid.

**Example 26.** The clauses in Fig. 5 are very similar to the standard Kripke semantics of intuitionistic logic. Note however, that the clause for implication strikingly differs from the expected one

$$-\text{ }s, \rho, \eta \vdash \phi \Rightarrow \psi \quad \text{if} \quad \text{for all } \eta \leqslant \eta', \ s, \rho, \eta' \vdash \phi \text{ implies } s, \rho, \eta' \vdash \psi,$$

though. The latter is indeed not validated by our semantics, as witnessed by the following example. Consider the following formulas φ and ψ respectively:

$$\ell \colon \mathsf{Ref}\_{\mathsf{Ref}\_{Int}} \vdash \exists \ell' . \exists x. \ell \hookrightarrow \ell' \land \ell' \hookrightarrow x \colon \text{prop} \tag{6}$$

$$\ell \colon \mathsf{Ref}\_{\mathsf{Ref}\_{Int}} \vdash \exists \ell' . \ell \hookrightarrow \ell' \land \ell' \hookrightarrow 6 \colon \text{prop} \tag{7}$$

The first formula is valid over heaplets, in which refers to a reference to some integer, while the second one is only valid over heaplets, in which refers to a reference to 6. Any <sup>η</sup><sup>1</sup> <sup>ě</sup> <sup>η</sup> " pidw,p{-<sup>2</sup>} Ď {-, -<sup>2</sup>},r-<sup>2</sup> ÞÑ 6sqq satisfies both (6) and (7) or none of them. However, the implication φ ñ ψ still is not valid over η in our semantics, for

$$\begin{split} \eta &\sim (w \hookrightarrow w \oplus (\ell' \colon Int), (\{\ell', \ell''\} \subseteq \{\ell, \ell', \ell''\}, [\ell' \mapsto 5, \ell'' \mapsto 6])) \\ &\leqslant (w \hookrightarrow w \oplus (\ell' \colon Int), (\{\ell, \ell', \ell''\} \subseteq \{\ell, \ell', \ell''\}, [\ell \mapsto \ell', \ell' \mapsto 5, \ell'' \mapsto 6])) \end{split}$$

and the latter heaplet validates φ but not ψ.

**Example 27.** Least μ and greatest ν fixpoints can be encoded in higher order logic [2]. As an example, consider

$$isList = \mu\gamma.\ell.\ l \hookrightarrow null \lor \exists \ell', x.\ell \hookrightarrow (x,\ell') \star \gamma(\ell'),$$

which specifies the fact that is a pointer to a head of a list (eliding coproduct injections in inl *null* and inrpx, -1 q). By definition, *isList* satisfies the following recursive equation:

$$isList(\ell) = \ell \hookrightarrow null \lor \exists \ell', x. \ell \hookrightarrow (x, \ell') \star isList(\ell')$$

Let us expand the semantics of the right hand side. We have

$$\begin{split} & \{ \ell \colon \mathsf{Ref}\_{\mathsf{list}} \, is \, List \colon \mathsf{P}(\mathsf{Ref}\_{\mathsf{list}}) \vdash l \hookrightarrow \mathsf{null} \, \vee \, \exists \ell', x. \ell \hookrightarrow (x, \ell') \, \star \, is \, List \, (\ell') \}\_{w} (\mathsf{is}List) \\ & = \{ (\rho \colon w \to w', (\underline{\mathsf{Ref}\_{\mathsf{list}} \rho} \rho)(\ell), \delta \in \dot{H}w')\_{\sim} \mid \mathsf{pr}\_{\rho(\ell)}(\delta) = \mathsf{null} \} \cup \\ & & \{ \ell \colon \mathsf{Ref}\_{\mathsf{list}}, is \, List \, \mathsf{r} \colon \mathsf{P}(\mathsf{Ref}\_{\mathsf{list}}) \vdash \exists \ell', x. \, \ell \hookrightarrow (x, \ell') \, \star \, is \, List \, (\ell') \}\_{w} (\mathsf{is}List) \\ & = \{ (\rho \colon w \to w', (\underline{\mathsf{Ref}\_{\mathsf{list}} \rho} \rho)(\ell), \delta \in \dot{H}w')\_{\sim} \mid \\ & & \mathsf{pr}\_{\rho(\ell)}(\delta) = \mathsf{null} \, \vee \, \exists \ell', \, x. \, \mathsf{pr}\_{\rho(\ell)} \, \delta = (x, \ell') \wedge \, \langle \rho, \ell', \delta \wedge \rho(\ell) \rangle\_{\sim} \in \mathrm{is} \, List \, \} \end{split}$$

where δ ρpq denotes the δ with the cell ρpq removed. In summary, pρ: w Ñ w1 ,pReflistρqp<sup>q</sup>, δ <sup>P</sup> Hw<sup>ˆ</sup> <sup>1</sup> <sup>q</sup>" is in -- : Reflist, *isList* : PpReflistq \$ *isList*p<sup>q</sup>wp*isList*<sup>q</sup> if and only if either prρp<sup>q</sup> <sup>δ</sup> " *null* or there exists an <sup>l</sup> <sup>1</sup> <sup>P</sup> <sup>w</sup><sup>1</sup> such that prρp<sup>q</sup> <sup>δ</sup> " px, -1 q and pρ, -1 , δ ρpqq" P *isList*.

## **8 Conclusions and Further Work**

Compositionality is an uncontroversial desirable property in semantics and reasoning, which admits strikingly different, but equally valid interpretations, as becomes particularly instructive when modelling dynamic memory allocation. From the programming perspective it is desirable to provide compositional means for keeping track of integrity of the underlying data, in particular, for preventing *dangling pointers*. Reasoning however inherently requires introduction of partially defined data, such as *heaplets*, which due to the compositionality principle must be regarded as first class semantic units.

Here we have made a step towards reconciling recent extensional monadbased denotational semantic for full-ground store [9] with higher order categorical reasoning frameworks [2] by constructing a suitable intuitionistic BI-hyperdoctrine. Much remains to be done. A highly desirable ingredient, which is currently missing in our logic in Fig. 4 is a construct relating programs and logical assertions, such as the following dynamic logic style modality

$$\frac{\varGamma \vdash\_{\mathsf{c}} p \colon A \qquad \varGamma \vdash \phi \colon \mathsf{P} A}{\varGamma \vdash [p]\phi \colon \operatorname{prop}}$$

which would allow us e.g. in a standard way to encode *Hoare triples* {φ}p{ψ} as implications φ ñ rpsψ. This is difficult due to the outlined discrepancy in the semantics for construction and reasoning. The categories of initializations for p and φ and the corresponding hiding monads are technically incompatible. In future work we aim to deeply analyse this phenomenon and develop a semantics for such modalities in a principled fashion.

Orthogonally to these plans we are interested in further study of the full ground store monad and its variants. One interesting research direction is developing algebraic presentations of these monads in terms of operations and equations [17]. Certain generic methods [13] were proposed for the simple store case (Example 3), and it remains to be seen if these can be generalized to the full ground store case.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Quantum Programming with Inductive Datatypes: Causality and Affine Type Theory

Romain Péchoux1, Simon Perdrix1, Mathys Rennela2, and Vladimir Zamdzhiev1(-)

<sup>1</sup> Université de Lorraine, CNRS, Inria, LORIA, F 54000 Nancy, France {romain.pechoux|simon.perdrix|vladimir.zamdzhiev}@loria.fr <sup>2</sup> Leiden University, Leiden, The Netherlands m.p.a.rennela@liacs.leidenuniv.nl

Abstract. Inductive datatypes in programming languages allow users to define useful data structures such as natural numbers, lists, trees, and others. In this paper we show how inductive datatypes may be added to the quantum programming language QPL. We construct a sound categorical model for the language and by doing so we provide the first detailed semantic treatment of user-defined inductive datatypes in quantum programming. We also show our denotational interpretation is invariant with respect to big-step reduction, thereby establishing another novel result for quantum programming. Compared to classical programming, this property is considerably more difficult to prove and we demonstrate its usefulness by showing how it immediately implies computational adequacy at all types. To further cement our results, our semantics is entirely based on a physically natural model of von Neumann algebras, which are mathematical structures used by physicists to study quantum mechanics.

Keywords: Quantum programming · Inductive types · Adequacy

## 1 Introduction

Quantum computing is a computational paradigm which takes advantage of quantum mechanical phenomena to perform computation. A quantum computer can solve problems which are out of reach for classical computers (e.g. factorisation of large numbers [24], solving large linear systems [8]). The recent developments of quantum technologies points out the necessity of filling the gap between theoretical quantum algorithms and the actual (prototypes of) quantum computers. As a consequence, quantum software and in particular quantum programming languages play a key role in the future development of quantum computing. The present paper makes several theoretical contributions towards the design and denotational semantics of quantum programming languages.

Our development is based around the quantum programming language QPL [23] which we extend with inductive datatypes. Our paper is the first to construct a denotational semantics for user-defined inductive datatypes in quantum programming. In the spirit of the original QPL, our type system is affine (discarding of arbitrary variables is allowed, but copying is restricted). We also extend QPL with a copy operation for classical data, because this is an admissible operation in quantum mechanics which improves programming convenience. The addition of inductive datatypes requires a departure from the original denotational semantics of QPL, which are based on finite-dimensional quantum structures, and we consider instead (possibly infinite-dimensional) quantum structures based on W\*-algebras (also known as von Neumann algebras), which have been used by physicists in the study of quantum foundations [25]. As such, our semantic treatment is physically natural and our model is more accessible to physicists and experts in quantum computing compared to most other denotational models.

QPL is a first-order programming language which has procedures, but it does not have lambda abstractions. Thus, there is no use for a !-modality and we show how to model the copy operation by describing the canonical comonoid structure of all classical types (including the inductive ones).

An important notion in quantum mechanics is the idea of causality which has been formulated in a variety of different ways. In this paper, we consider a simple operational interpretation of causality: if the output of a physical process is discarded, then it does not matter which process occurred [10]. In a symmetric monoidal category **C** with tensor unit I, this can be understood as requiring that for any morphism (process) f : A<sup>1</sup> → A2, it must be the case that <sup>A</sup><sup>2</sup> ◦ f = <sup>A</sup><sup>1</sup> , where <sup>A</sup><sup>i</sup> : A<sup>i</sup> → I is the discarding map (process) at the given objects. This notion ties in very nicely with our affine language, because we have to show that the interpretation of values is causal, i.e., values are always discardable.

A major contribution of this paper is that we prove the denotational semantics is invariant with respect to both small-step reduction and big-step reduction. The latter is more difficult in quantum programming and our paper is the first to demonstrate such a result. As a corollary, we obtain computational adequacy.

## 2 Syntax of QPL

The syntax of QPL (including our extensions) is summarised in Figure 1. A wellformed type context, denoted Θ, is simply a list of distinct type variables. A type A is well-formed in type context Θ, denoted Θ A, if the judgement can be derived according to the following rules (see [1,6] for a more detailed exposition):

$$\begin{array}{cccc} \begin{array}{c} \vdash \Theta\\\hline \Theta \vdash \Theta\_{i} \end{array} & \begin{array}{c} \vdash \Theta\\\hline \Theta \vdash I \end{array} & \begin{array}{c} \vdash \Theta\\\hline \Theta \vdash \mathsf{qbit} \end{array} & \begin{array}{c} \begin{array}{c} \Theta \vdash A \end{array} & \begin{array}{c} \Theta \vdash B\\\hline \Theta \vdash A \end{array} \end{array} \star \in \{+, \otimes\} & \begin{array}{c} \begin{array}{c} \Theta, X \vdash A\\\hline \Theta \vdash \mu X.A \end{array} \end{array} \end{array}$$

A type A is closed if · A. Note that nested type induction is allowed. Henceforth, we implicitly assume that all types we are dealing with are well-formed.

Example 1. The type of natural numbers is defined as Nat ≡ μX.I + X. Lists of a closed type · A are defined as List(A) ≡ μY.I + A ⊗ Y.

Notice that our type system is not equipped with a !-modality. Indeed, in the absence of function types, there is no reason to introduce it. Instead, we specify

Types A, B ::= <sup>X</sup> <sup>|</sup> <sup>I</sup> <sup>|</sup> qbit <sup>|</sup> <sup>A</sup> <sup>+</sup> <sup>B</sup> <sup>|</sup> <sup>A</sup> <sup>⊗</sup> <sup>B</sup> <sup>|</sup> μX.A Classical Types P, R ::= <sup>X</sup> <sup>|</sup> <sup>I</sup> <sup>|</sup> <sup>P</sup> <sup>+</sup> <sup>R</sup> <sup>|</sup> <sup>P</sup> <sup>⊗</sup> <sup>R</sup> <sup>|</sup> μX.P Terms M,N ::= new unit u | discard x | y = copy x | new qbit q | <sup>b</sup> <sup>=</sup> measure <sup>q</sup> <sup>|</sup> <sup>q</sup>1,...,q<sup>n</sup> <sup>∗</sup><sup>=</sup> <sup>S</sup> <sup>|</sup> <sup>M</sup>; <sup>N</sup> <sup>|</sup> skip <sup>|</sup> while b do M | x = leftA,BM | x = rightA,BM | case <sup>y</sup> of {left <sup>x</sup><sup>1</sup> <sup>→</sup> <sup>M</sup> <sup>|</sup> right <sup>x</sup><sup>2</sup> <sup>→</sup> <sup>N</sup>} <sup>|</sup> x = (x1, x2) | (x1, x2) = x | y = fold x | y = unfold x | proc <sup>f</sup> :: <sup>x</sup> : <sup>A</sup> <sup>→</sup> <sup>y</sup> : <sup>B</sup> {M} <sup>|</sup> <sup>y</sup> <sup>=</sup> <sup>f</sup>(x) Variable contexts Γ, Σ ::= x<sup>1</sup> : A1,...,x<sup>n</sup> : A<sup>n</sup> Procedure contexts <sup>Π</sup> ::= <sup>f</sup><sup>1</sup> : <sup>A</sup><sup>1</sup> <sup>→</sup> <sup>B</sup>1,...,f<sup>n</sup> : <sup>A</sup><sup>n</sup> <sup>→</sup> <sup>B</sup><sup>n</sup> <sup>Π</sup> - Γ new unit <sup>u</sup> Γ, u : <sup>I</sup> <sup>Π</sup> -Γ, x : <sup>A</sup> discard <sup>x</sup> Γ

P is a classical type <sup>Π</sup> - Γ, x : <sup>P</sup> <sup>y</sup> <sup>=</sup> copy <sup>x</sup> Γ, x : P, y : <sup>P</sup> <sup>Π</sup> -Γ skip Γ

$$\begin{array}{c} \Pi \vdash \langle I' \rangle \ M \ \langle I' \rangle \qquad \Pi \vdash \langle I' \rangle \ N \ \langle \Sigma \rangle\\ \hline \Pi \vdash \langle I' \rangle \ M; N \ \langle \Sigma \rangle\\ \hline \end{array}$$

$$\begin{array}{c} \Pi \vdash \langle I', b:\textbf{bit} \rangle \ M \ \langle I, b:\textbf{bit} \rangle\\ \hline \Pi \vdash \langle I, b:\textbf{bit} \rangle \text{ while } b \text{ do } M \ \langle I, b:\textbf{bit} \rangle \end{array}$$

<sup>Π</sup> - Γ new qbit <sup>q</sup> Γ, q : qbit <sup>Π</sup> - Γ, q : qbit <sup>b</sup> <sup>=</sup> measure <sup>q</sup> Γ, b : bit S is a unitary of arity n <sup>Π</sup> - Γ, q<sup>1</sup> : qbit,...,q<sup>n</sup> : qbit <sup>q</sup>1,...,q<sup>n</sup> <sup>∗</sup><sup>=</sup> <sup>S</sup> Γ, q<sup>1</sup> : qbit,...,q<sup>n</sup> : qbit <sup>Π</sup> - Γ, x : <sup>A</sup> <sup>y</sup> <sup>=</sup> leftA,B <sup>x</sup> Γ, y : <sup>A</sup> <sup>+</sup> <sup>B</sup> <sup>Π</sup> - Γ, x : <sup>B</sup> <sup>y</sup> <sup>=</sup> rightA,B <sup>x</sup> Γ, y : <sup>A</sup> <sup>+</sup> <sup>B</sup> <sup>Π</sup> - Γ, x<sup>1</sup> : <sup>A</sup> <sup>M</sup><sup>1</sup> Σ <sup>Π</sup> - Γ, x<sup>2</sup> : <sup>B</sup> <sup>M</sup><sup>2</sup> Σ <sup>Π</sup> - Γ, y : <sup>A</sup> <sup>+</sup> <sup>B</sup> case <sup>y</sup> of {leftA,B <sup>x</sup><sup>1</sup> <sup>→</sup> <sup>M</sup><sup>1</sup> <sup>|</sup> rightA,B <sup>x</sup><sup>2</sup> <sup>→</sup> <sup>M</sup><sup>2</sup> } Σ <sup>Π</sup> - Γ, x<sup>1</sup> : A, x<sup>2</sup> : <sup>B</sup> <sup>x</sup> = (x1, x2) Γ, x : <sup>A</sup> <sup>⊗</sup> <sup>B</sup> <sup>Π</sup> - Γ, x : <sup>A</sup> <sup>⊗</sup> <sup>B</sup> (x1, x2) = <sup>x</sup> Γ, x<sup>1</sup> : A, x<sup>2</sup> : <sup>B</sup> <sup>Π</sup> - Γ, x : <sup>A</sup>[μX.A/X] <sup>y</sup> <sup>=</sup> foldμX.A <sup>x</sup> Γ, y : μX.A <sup>Π</sup> - Γ, x : μX.A <sup>y</sup> <sup>=</sup> unfold <sup>x</sup> Γ, y : <sup>A</sup>[μX.A/X] Π, f : <sup>A</sup> <sup>→</sup> <sup>B</sup> <sup>x</sup> : <sup>A</sup> <sup>M</sup> <sup>y</sup> : <sup>B</sup> <sup>Π</sup> - Γ proc <sup>f</sup> :: <sup>x</sup> : <sup>A</sup> <sup>→</sup> <sup>y</sup> : <sup>B</sup> {M} Γ Π, f : <sup>A</sup> <sup>→</sup> <sup>B</sup> -Γ, x : <sup>A</sup> <sup>y</sup> <sup>=</sup> <sup>f</sup>(x) Γ, y : <sup>B</sup>

Fig. 1: Syntax and formation rules for QPL terms.

the subset of types where copying is an admissible operation. The classical types are a subset of our types defined in Figure 1. They are characterised by the property that variables of classical types may be copied, whereas variables of non-classical types may not be copied (see the rule for copying in Figure 1).

We use small Latin letters (e.g. x, y, u, q, b) to range over term variables. More specifically, q ranges over variables of type qbit, u over variables of unit type I, b over variables of type bit := I +I and x, y range over variables of arbitrary type. We use Γ and Σ to range over variable contexts. A variable context is a function from term variables to closed types, which we write as Γ = x<sup>1</sup> : A1,...,x<sup>n</sup> : An.

We use f,g to range over procedure names. Every procedure name f has an input type A and an output type B, denoted f : A → B, where A and B are closed types. We use Π to range over procedure contexts. A procedure context is a function from procedure names to pairs of procedure input-output types, denoted Π = f<sup>1</sup> : A<sup>1</sup> → B1,...,f<sup>n</sup> : A<sup>n</sup> → Bn.

Remark 2. Unlike lambda abstractions, procedures cannot be passed to other procedures as input arguments, nor can they be returned as output.

A term judgement has the form Π Γ M Σ (see Figure 1) and indicates that term M is well-formed in procedure context Π with input variable context Γ and output variable context Σ. All types occurring within it are closed.

The intended interpretation of the quantum rules are as follows. The term new qbit q prepares a new qubit q in state |0 0|. The term q1,...,q<sup>n</sup> ∗= S applies a unitary operator S to a sequence of qubits in the standard way. The term b = measure q performs a quantum measurement on qubit q and stores the measurement outcome in bit b. The measured qubit is destroyed in the process.

The no-cloning theorem of quantum mechanics [28] shows that arbitrary qubits cannot be copied. Because of this, copying is restricted only to classical types, as indicated in Figure 1, and this allows us to avoid runtime errors. Like the original QPL [23], our type system is also affine and so any variable can be discarded (see the formation rule for the term discard x in Figure 1).

## 3 Operational Semantics of QPL

In this section we describe the operational semantics of QPL. The central notion is that of a program configuration which provides a complete description of the current state of program execution. It consists of four components that must satisfy some coherence properties: (1) the term which remains to be executed; (2) a value assignment, which is a function that assigns formal expressions to variables as a result of execution; (3) a procedure store which keeps track of what procedures have been defined so far and (4) the quantum state computed so far.

Value Assignments. A value is an expression defined by the following grammar:

v, w ::= ∗ | n | leftA,Bv | rightA,Bv | (v, w) | foldμX.Av

where n ranges over the natural numbers. Think of ∗ as representing the unique value of unit type I and of n as representing a pointer to the n-th qubit of a quantum state ρ. Specific values of interest are ff := leftI,I ∗ and tt := rightI,I ∗ which correspond to false and true respectively.

A qubit pointer context is a set Q of natural numbers. A value v of type A is well-formed in qubit pointer context Q, denoted Q v : A, if the judgement is derivable from the following rules:

$$\begin{array}{llll} \hline \hline \hline \vdash \ast : I & \{n\} \vdash n : \mathsf{qbit} & \begin{array}{c} Q \vdash v : A\\ Q \vdash \mathsf{left} \mathsf{left}\_{A,B} v : A+B \end{array} & \begin{array}{c} Q \vdash v : B\\ Q \vdash \mathsf{right}\_{A,B} v : A+B \end{array} \\\hline \hline \hline \end{array}$$

$$\begin{array}{llll} \frac{Q\_1 \vdash v : A & Q\_2 \vdash w : B \qquad Q\_1 \cap Q\_2 = \mathcal{Q}}{Q\_1, Q\_2 \vdash (v,w) : A \otimes B} & \begin{array}{c} Q \vdash v : A[\mu X.A/X] \\ Q \vdash \mathsf{fold}\_{\mu X.A} v : \mu X.A \end{array} \\\hline \end{array}$$

If v is well-formed, then its type and qubit pointer context are uniquely determined. If Q v : P with P classical, then we say v is a classical value.

Lemma 3. If Q v : P is a well-formed classical value, then Q = ·.

A value assignment is a function from term variables to values, which we write as V = {x<sup>1</sup> = v1,...,x<sup>n</sup> = vn}, where x<sup>i</sup> are variables and v<sup>i</sup> are values. A value assignment is well-formed in qubit pointer context Q and variable context Γ, denoted Q; Γ V, if V has exactly the same variables as Γ, so that Γ = {x<sup>1</sup> : A1,...,x<sup>n</sup> : An}, and Q = Q1,...,Qn, s.t. Q<sup>i</sup> v<sup>i</sup> : Ai. Such a splitting of Q is necessarily unique, if it exists, and some of the Q<sup>i</sup> may be empty.

Procedure Stores. A procedure store is a set of procedure definitions, written as:

$$\mathcal{Q} = \left\{ f\_1 :: x\_1 : A\_1 \to y\_1 : B\_1 \left\{ M\_1 \right\}, \dots, f\_n :: x\_n : A\_n \to y\_n : B\_n \left\{ M\_n \right\} \right\}.$$

A procedure store is well-formed in procedure context Π, written Π Ω, if the judgement is derivable via the following rules:

$$\begin{array}{c} \begin{array}{c} \Pi \vdash \Omega \end{array} \quad \begin{array}{c} \Pi, f:A \to B \vdash \langle x:A \rangle \; M \; \langle y:B \rangle \end{array} \end{array}$$

Program Configurations. A program configuration is a quadruple (M | V | Ω | ρ), where M is a term, V is a value assignment, Ω is a procedure store and ρ ∈ C2n×2<sup>n</sup> is a finite-dimensional density matrix with 0 ≤ tr(ρ) ≤ 1. The density matrix ρ represents a (mixed) quantum state and its trace may be smaller than one because we also use it to encode probability information (see Remark 4). We write dim(ρ) = n to indicate that the dimension of ρ is n.

A well-formed program configuration is a configuration (M | V | Ω | ρ), where there exist (necessarily unique) Π, Γ, Σ, Q, such that: (1) Π Γ M Σ is a well-formed term; (2) Q; Γ V is a well-formed value assignment; (3) Π Ω is a well-formed procedure store; and (4) Q = {1, 2,..., dim(ρ)}. We write Π; Γ; Σ; Q (M | V | Ω | ρ) to indicate this situation. The formation rules enforce that the qubits of ρ and the qubit pointers from V are in a 1-1 correspondence.

Fig. 2: Small Step Operational semantics of QPL.

The small step semantics is defined for configurations (M | V | Ω | ρ) by induction on M in Figure 2 and we now explain the notations used therein.

In the rule for discarding, we use two functions that depend on a value v. They are trv, which modifies the quantum state ρ by tracing out all of its qubits which are used in v, and r<sup>v</sup> which simply reindexes the value assignment, so that the pointers within rv(V ) correctly point to the corresponding qubits of trv(ρ), which is potentially of smaller dimension than ρ. Formally, for a well-formed value v, let Q and A be the unique qubit pointer context and type, such that Q v : A. Then trv(ρ) is the quantum state obtained from ρ by tracing out all qubits specified by Q. Given a value assignment V = {x<sup>1</sup> = v1,...x<sup>n</sup> = vn}, then rv(V ) = {x<sup>1</sup> = r- <sup>v</sup>(v1),...,x<sup>n</sup> = r- <sup>v</sup>(vn)}, where:

$$r'\_v(w) = \begin{cases} \*, & \text{if } w = \*\\ k - |\{i \in Q \mid i < k\}|, & \text{if } w = k \in \mathbb{N} \\ \mathtt{left}\ r'\_v(w'), & \text{if } w = \mathtt{left}\ w'\\ \mathtt{right}\ r'\_v(w'), & \text{if } w = \mathtt{right}\ w'\\ (r'\_v(w\_1), r'\_v(w\_2)) & \text{if } w = (w\_1, w\_2) \\ \mathtt{fold}\ r'\_v(w'), & \text{if } w = \mathtt{fold}\ w' \end{cases}$$

In the rule for unitaries, the superoperator Sm#" applies the unitary S to the vector of qubits specified by <sup>m</sup>#". In the rules for measurement, the <sup>m</sup>-th qubit of ρ is measured in the computational basis, the measured qubit is destroyed in the process and the measurement outcome is stored in the bit b. More specifically, |i <sup>m</sup> = I2m−<sup>1</sup> ⊗ |i ⊗ I2n−<sup>m</sup> and <sup>m</sup>i| is its adjoint, for i ∈ {0, 1}, and where I<sup>n</sup> is the identity matrix in C<sup>n</sup>×<sup>n</sup>.

Remark 4. Because of the way we decided to handle measurements, reduction (<sup>−</sup> - −) is a nondeterministic operation, where we encode the probabilities of reduction within the trace of our density matrices in a similar way to [9]. Equivalently, we may see the reduction relation as probabilistic provided that we normalise all density matrices and decorate the reductions with the appropriate probability information as specified by the Born rule of quantum mechanics. The nondeterministic view leads to a more concise and clear presentation and because of this we have chosen it over the probabilistic view.

The introduction rule for procedures simply defines a procedure which is added to the procedure store. In the rule for calling procedures, the term M<sup>α</sup> is α-equivalent to M and is obtained from it by renaming the input x<sup>2</sup> to x1, renaming the output y<sup>2</sup> to y<sup>1</sup> and renaming all other variables within M to some fresh names, so as to avoid conflicts with the input, output and the rest of the variables within V .

Theorem 5 (Subject reduction). If Π; Γ; Σ; Q (M | V | Ω | ρ) and (<sup>M</sup> <sup>|</sup> <sup>V</sup> <sup>|</sup> <sup>Ω</sup> <sup>|</sup> <sup>ρ</sup>) - (M- | V - | Ω- | ρ- ), then Π- ; Γ- ; Σ; Q- (M- | V - | Ω- | ρ- ), for some (necessarily unique) contexts Π- , Γ- , Qand where Σ is invariant.

Assumption 6. From now on we assume all configurations are well-formed.

$$\begin{array}{lcll} & & (M \mid \mathtt{b} = \mathtt{tt} \mid \cdot \mid 1) \\ \text{while } \mathtt{b} \text{ do } \{ \\ \mathtt{new} \neq \mathtt{q} \mid \mathtt{t} \quad \mathtt{q} \} & & (M \mid \mathtt{b} = \mathtt{tt} \mid \cdot \mid 0.5) \ (\mathtt{skip} \mid \mathtt{b} = \mathtt{ff} \mid \cdot \mid 0.5) \\ \mathtt{q} \mathrel{\mathtt{st} = H}; & & \mathsf{s} \mathrel{\mathtt{st}'} \qquad \qquad \qquad \mathrel{\mathtt{\quad}} \\ \mathtt{b} \mathrel{\mathtt{b} = \mathtt{meas} \, \mathtt{re} \quad \mathtt{q}} & (M \mid \mathtt{b} = \mathtt{tt} \mid \cdot \mid 0.25) \ (\mathtt{skip} \mid \mathtt{b} = \mathtt{ff} \mid \cdot \mid 0.25) \\ \mathtt{b} \mathrel{\mathtt{\quad}} & \mathsf{s} \mathrel{\mathtt{\quad}} & \mathsf{\quad} \, \mathtt{\quad} \\ \mathtt{b} \mathrel{\mathtt{\quad}} & \mathsf{t} \mathrel{\mathtt{\quad}} & (\mathtt{skip} \mid \mathtt{b} = \mathtt{ff} \mid \cdot \mid 0.125) \\ \end{array}$$

(b) A reduction graph involving M

Fig. 3: Example of a term and of a reduction graph.

A configuration (M | V | Ω | ρ) is said to be terminal if M = skip. Program execution finishes at terminal configurations, which are characterised by the property that they do not reduce any further. We will use calligraphic letters (C, D,...) to range over configurations and we will use T to range over terminal configurations. For a configuration C = (M | V | Ω | ρ), we write for brevity tr(C) := tr(ρ) and we shall say C is normalised whenever tr(C)=1. We say that a configuration C is impossible if tr(C)=0 and we say it is possible otherwise.

Theorem 7 (Progress). If C is a configuration, then either C is terminal or there exists a configuration <sup>D</sup>, such that <sup>C</sup> - D. Moreover, if C is not terminal, then tr(C) = C-<sup>D</sup> tr(D) and there are at most two such configurations <sup>D</sup>.

In the situation of the above theorem, the probability of reduction is given by Pr(<sup>C</sup> - D) := tr(D)/tr(C), for any possible C (see Remark 4) and Theorem 7 shows the total probability of all single-step reductions is 1. If C is impossible, then C occurs with probability 0 and subsequent reductions are also impossible.

Probability of Termination. Given configurations C and D let Seqn(C, D) := {C<sup>0</sup> - ··· - <sup>C</sup>n| C<sup>0</sup> <sup>=</sup> <sup>C</sup> and <sup>C</sup><sup>n</sup> <sup>=</sup> D}, and let Seq≤<sup>n</sup>(C, <sup>D</sup>) = <sup>n</sup> <sup>i</sup>=0 Seqn(C, D). Finally, let TerSeq≤<sup>n</sup>(C) := <sup>T</sup> terminal Seq≤<sup>n</sup>(C, <sup>T</sup> ). In other words, TerSeq≤<sup>n</sup>(C) is the set of all reduction sequences from C which terminate in at most n steps (including 0 if C is terminal). For every terminating reduction sequence <sup>r</sup> = (<sup>C</sup> - ··· - T ), let End(r) := T , i.e. End(r) is simply the (terminal) endpoint of the sequence.

For any configuration <sup>C</sup>, the sequence <sup>r</sup>∈TerSeq≤n(C) tr(End(r)) <sup>n</sup>∈<sup>N</sup> is increasing with upper bound tr(C) (follows from Theorem 7). For any possible C, we define:

$$\operatorname{Halt}(\mathcal{C}) := \bigvee\_{n=0}^{\infty} \sum\_{r \in \operatorname{TerSeg}\_{\leq\_n}(\mathcal{C})} \operatorname{tr}(\operatorname{End}(r)) / \operatorname{tr}(\mathcal{C})$$

which is exactly the probability of termination of C. This is justified, because Halt( T )=1, for any terminal (and possible) configuration T and Halt(C) = C-D D possible Pr(<sup>C</sup> - <sup>D</sup>)Halt(D). We write -<sup>∗</sup> for the transitive closure of -.

```
proc GHZnext :: l : ListQ -> l : ListQ {
  new qbit q;
  case l of
      nil -> q*=H;
              l = q :: nil
    | q' :: l' -> q',q *= CNOT;
                   l = q :: q' :: l'
}
proc GHZ :: n : Nat -> l : ListQ {
  case n of
      zero -> l = nil
    | s(n') -> l = GHZnext(GHZ(n'))
}
(a) Procedures for generating
GHZn.
                                                   (l = GHZnext(l) | l = 2 :: 1 :: nil | Ω | γ2)
                                                                         -

                                                   (new qbit q; ··· | l = 2 :: 1 :: nil | Ω | γ2)
                                                                         -

                                            (case l of ··· | l = 2 :: 1 :: nil, q = 3 | Ω | γ2 ⊗ |00|)
                                                                         -
                                                                         ∗
                                        (q',q *=CNOT; ··· | l' = 1 :: nil, q = 3, q' = 2 | Ω | γ2 ⊗ |00|)
                                                                         -

                                           (l = q :: q' :: l' | l' = 1 :: nil, q = 3, q' = 2 | Ω | γ3)
                                                                         -
                                                                         ∗
                                                       (skip | l = 3 :: 2 :: 1 :: nil | Ω | γ3)
                                                   (l = GHZ(n) | n = s(s(s(zero))) | Ω | 1)
                                                                         -
                                                                         ∗
```
(b) A reduction sequence producing GHZ3.

Fig. 4: Example with lists of qubits and a recursive procedure.

Example 8. Consider the term M in Figure 3. The body of the while loop (3a) has the effect of performing a fair coin toss (realised through quantum measurement in the standard way) and storing the outcome in variable b. Therefore, starting from configuration C = (M | b = tt |·| 1), as in Subfigure 3b, the program has the effect of tossing a fair coin until ff shows up. The set of terminal configurations reachable from <sup>C</sup> is {(skip <sup>|</sup> <sup>b</sup> <sup>=</sup> ff |·| <sup>2</sup>−<sup>i</sup> ) <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>N</sup>≥<sup>1</sup>} and the last component of each configuration is a 1×1 density matrix which is exactly the probability of reducing to the configuration. Therefore Halt(C) = <sup>∞</sup> <sup>i</sup>=1 2−<sup>i</sup> = 1.

Example 9. The GHZ<sup>n</sup> state is defined as γ<sup>n</sup> := (|0 <sup>⊗</sup><sup>n</sup>+|<sup>1</sup> <sup>⊗</sup><sup>n</sup>)(0<sup>|</sup> <sup>⊗</sup><sup>n</sup>+1<sup>|</sup> <sup>⊗</sup><sup>n</sup>)/2. In Figure 4, we define a procedure GHZ, which given a natural number n, generates the state γn, which is represented as a list of qubits of length n. The procedure (4a) uses an auxiliary procedure GHZnext, which given a list of qubits representing the state γn, returns the state γn+1 again represented as a list of qubits. The two procedures make use of some (hopefully obvious) syntactic sugar. In 4b, we also present the last few steps of a reduction sequence which produces γ<sup>3</sup> starting from configuration (l = GHZ(n) | n = s(s(s(zero))) | Ω | 1), where Ω contains the above mentioned procedures. In the reduction sequence we only show the term in evaluating position and we omit some intermediate steps. The type ListQ is a shorthand for List(qbit) from Example 1.

## 4 W\*-algebras

In this section we describe our denotational model. It is based on W\*-algebras, which are algebras of observables (i.e. physical entities), with interesting domaintheoretic properties. We recall some background on W\*-algebras and their categorical structure. We refer the reader to [25] for an encyclopaedic account on W\*-algebras.

Domain-theoretic Preliminaries. Recall that a directed subset of a poset P is a non-empty subset X ⊆ P in which every pair of elements of X has an upper bound in X. A poset P is a directed-complete partial order (dcpo) if each directed subset has a supremum. A poset P is pointed if it has a least element, usually denoted by ⊥. A monotone map f : P → Q between posets is Scott-continuous if it preserves suprema of directed subsets. If P and Q are pointed and f preserves the least element, then we say f is strict. We write **DCPO** (**DCPO**⊥!) for the category of (pointed) dcpo's and (strict) Scott-continuous maps between them.

Definition of W\*-algebras. A complex algebra is a complex vector space V equipped with a bilinear multiplication (−·−) : V × V → V , which we write as juxtaposition. A Banach algebra A is a complex algebra A equipped with a submultiplicative norm − : <sup>A</sup> <sup>→</sup> <sup>R</sup>≥<sup>0</sup>, i.e. <sup>∀</sup>x, y <sup>∈</sup> <sup>A</sup> : xy≤xy. A ∗-algebra A is a complex algebra A with an involution (−)<sup>∗</sup> : A → A such that (x∗)<sup>∗</sup> = x, (x + y)<sup>∗</sup> = (x<sup>∗</sup> + y∗), (xy)<sup>∗</sup> = y∗x<sup>∗</sup> and (λx)<sup>∗</sup> = λx∗, for x, y ∈ A and <sup>λ</sup> <sup>∈</sup> <sup>C</sup>. A C\*-algebra is a Banach <sup>∗</sup>-algebra <sup>A</sup> which satisfies the C\*-identity, i.e. x∗x <sup>=</sup> x<sup>2</sup> for all <sup>x</sup> <sup>∈</sup> <sup>A</sup>. A C\*-algebra <sup>A</sup> is unital if it has an element 1 ∈ A, such that for every x ∈ A : x1=1x = x. All C\*-algebras in this paper are unital and for brevity we regard unitality as part of their definition.

Example 10. The algebra <sup>M</sup>n(C) of <sup>n</sup> <sup>×</sup> <sup>n</sup> complex matrices is a C\*-algebra. In particular, the set of complex numbers C has a C\*-algebra structure since <sup>M</sup>1(C) <sup>∼</sup><sup>=</sup> <sup>C</sup>. More generally, the <sup>n</sup> <sup>×</sup> <sup>n</sup> matrices valued in a C\*-algebra <sup>A</sup> also form a C\*-algebra Mn(A). The C\*-algebra of qubits is qbit := M2(C).

An element x ∈ A of a C\*-algebra A is called positive if ∃y ∈ A : x = y∗y. The poset of positive elements of A is denoted A<sup>+</sup> and its order is given by <sup>x</sup> <sup>≤</sup> <sup>y</sup> iff (<sup>y</sup> <sup>−</sup> <sup>x</sup>) <sup>∈</sup> <sup>A</sup><sup>+</sup>. The unit interval of <sup>A</sup> is the subposet [0, 1]<sup>A</sup> <sup>⊆</sup> <sup>A</sup><sup>+</sup> of all positive elements x such that 0 ≤ x ≤ 1.

Let f : A → B be a linear map between C\*-algebras A and B. We say that f is positive if it preserves positive elements. We say that f is completely positive if it is <sup>n</sup>-positive for every <sup>n</sup> <sup>∈</sup> <sup>N</sup>, i.e. the map <sup>M</sup>n(f) : <sup>M</sup>n(A) <sup>→</sup> Mn(B) defined for every matrix [xi,j ]<sup>1</sup>≤i,j≤<sup>n</sup> ∈ Mn(A) by Mn(f)([xi,j ]<sup>1</sup>≤i,j≤<sup>n</sup>) = [f(xi,j )]<sup>1</sup>≤i,j≤<sup>n</sup> is positive. The map f is called multiplicative, involutive, unital if it preserves multiplication, involution, and the unit, respectively. The map f is called subunital whenever the inequalities 0 ≤ f(1) ≤ 1 hold. A state on a C\*-algebra <sup>A</sup> is a completely positive unital map <sup>s</sup> : <sup>A</sup> <sup>→</sup> <sup>C</sup>.

Although W\*-algebras are commonly defined in topological terms (as C\* algebras closed under several operator topologies) or equivalently in algebraic terms (as C\*-algebras which are their own bicommutant), one can also equivalently define them in domain-theoretic terms [19], as we do next.

A completely positive map between C\*-algebras is normal if its restriction to the unit interval is Scott-continuous [19, Proposition A.3]. A W\*-algebra is a C\*-algebra A such that the unit interval [0, 1]<sup>A</sup> is a dcpo, and A has a separating set of normal states: for every <sup>x</sup> <sup>∈</sup> <sup>A</sup>+, if <sup>x</sup> = 0, then there is a normal state <sup>s</sup> : <sup>A</sup> <sup>→</sup> <sup>C</sup> such that <sup>s</sup>(x) = 0 [25, Theorem III.3.16].

A linear map f : A → B between W\*-algebras A and B is called an NCPSUmap if f is normal, completely positive and subunital. The map f is called an NMIU-map if f is normal, multiplicative, involutive and unital. We note that every NMIU-map is necessarily an NCPSU-map and that W\*-algebras are closed under formation of matrix algebras as in Example 10.

Categorical Structure. Let **W**<sup>∗</sup> NCPSU be the category of W\*-algebras and NCPSUmaps and let **W**<sup>∗</sup> NMIU be its full-on-objects subcategory of NMIU-maps. Throughout the rest of the paper let **C** := (**W**<sup>∗</sup> NCPSU)op and let **V** := (**W**<sup>∗</sup> NMIU)op. QPL types are interpreted as functors -<sup>Θ</sup> <sup>A</sup> : **<sup>V</sup>**|Θ<sup>|</sup> <sup>→</sup> **<sup>V</sup>** and closed QPL types as objects -<sup>A</sup> <sup>∈</sup> Ob(**V**) = Ob(**C**). One should think of **<sup>V</sup>** as the category of values, because the interpretation of our values from §3 are indeed **V**-morphisms. General QPL terms are interpreted as morphisms of **C**, so one should think of **C** as the category of computations. We now describe the categorical structure of **V** and **C** and later we justify our choice for working in the opposite categories.

Both **C** and **V** have a symmetric monoidal structure when equipped with the spatial tensor product, denoted here by (−⊗−), and tensor unit <sup>I</sup> := <sup>C</sup> [11, Section 10]. Moreover, **V** is symmetric monoidal closed and also complete and cocomplete [11]. **C** and **V** have finite coproducts, given by direct sums of W\* algebras [2, Proposition 4.7.3]. The coproduct of objects A and B is denoted by A + B and the coproduct injections are denoted leftA,B : A → A + B and rightA,B : B → A + B. Given morphisms f : A → C and g : B → C, we write [f,g] : A + B → C for the unique cocone morphism induced by the coproduct. Moreover, coproducts distribute over tensor products [2, §4.6]. More specifically, there exists a natural isomorphism dA,B,C : A ⊗ (B + C) → (A ⊗ B)+(A ⊗ C) which satisfies the usual coherence conditions. The initial object in **C** is moreover a zero object and is denoted <sup>0</sup>. The W\*-algebra of bits is bit := <sup>I</sup> <sup>+</sup> <sup>I</sup> <sup>=</sup> <sup>C</sup> <sup>⊕</sup> <sup>C</sup>.

The categories **V**, **C** and **Set** are related by symmetric monoidal adjunctions:

$$\mathbf{Set} \xleftarrow[\xleftarrow{F}]{F} \mathbf{v} \xleftarrow[\xleftarrow{J}]{} \mathbf{v} \xleftarrow[\xleftarrow{\bot}]{} \mathbf{C} \xleftarrow[\xleftarrow{\bot}]{} \mathbf{C} \xleftarrow[\xleftarrow{\bot}]{} \mathbf{C} \xleftarrow[\xleftarrow{\bot}]{} \mathbf{C} \xleftarrow[\xleftarrow{\bot}]{} \mathbf{C} \x \xleftarrow[\xleftarrow{\bot}]{} \mathbf{C} \x \xleftarrow[\xleftarrow{\bot}]{} \mathbf{C} \x \xleftarrow[\xleftarrow{\bot}]{} \mathbf{C} \x \xleftarrow[\xleftarrow{\bot}]{} \mathbf{C} \x \x \xleftarrow[\xleftarrow{\bot}]{} \mathbf{C} \x \x \xleftarrow[\xleftarrow{\bot}]{} \mathbf{C} \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x \x$$

and the subcategory inclusion J preserves coproducts and tensors up to equality.

Interpreting QPL within **C** and **V** is not an ad hoc trick. In physical terms, this corresponds to adopting the Heisenberg picture of quantum mechanics and this is usually done when working with infinite-dimensional W\*-algebras (like we do). Semantically, this is necessary, because (1) our type system has conditional branching and we need to interpret QPL terms within a category with finite coproducts; (2) we have to be able to compute parameterised initial algebras to interpret inductive datatypes. The category **W**<sup>∗</sup> NCPSU has finite products, but it does not have coproducts, so by interpreting QPL terms within **C** = (**W**<sup>∗</sup> NCPSU)op we solve problem (1). For (2), the monoidal closure of **V** = (**W**<sup>∗</sup> NMIU)op is crucial, because it implies the tensor product preserves ω-colimits.

$$\begin{array}{l} \mathsf{tr}: M\_{n}(\mathbb{C}) \to \mathbb{C} \\ \mathsf{tr}: A \mapsto \sum\_{i} A\_{i,i} \\ \mathsf{tr}^{\dagger}: \mathbb{C} \to M\_{n}(\mathbb{C}) \\ \mathsf{tr}^{\dagger}: a \mapsto aI\_{n} \end{array} \left| \begin{array}{l} \mathsf{new}\_{\rho}: \mathbb{C} \to M\_{2^{n}}(\mathbb{C}) \\ \mathsf{new}\_{\rho}: a \mapsto a\rho \\ \mathsf{new}\_{\rho}^{\dagger}: M\_{2^{n}}(\mathbb{C}) \to \mathbb{C} \\ \mathsf{new}\_{\rho}^{\dagger}: A \mapsto \mathsf{tr}(A\rho) \end{array} \right| \begin{array}{l} \mathsf{new}\_{\rho}: M\_{2}(\mathbb{C}) \to \mathbb{C} \oplus \mathbb{C} \\ \mathsf{new}\_{\rho}: M\_{3}(\mathbb{C}) \to \{a\} \\ \mathsf{new}\_{\rho}^{\dagger}: A \mapsto SAS^{\dagger} \\ \mathsf{new}\_{\rho}^{\dagger}: \mathbb{C} \oplus \mathbb{C} \to M\_{2}(\mathbb{C}) \\ \mathsf{new}\_{\rho}^{\dagger}: \{a\,\,d\} \mapsto \begin{pmatrix} a & 0 \\ 0 & d \end{pmatrix} \end{array} \right| \\ \left| \begin{array}{l} \mathsf{new}\_{\rho}^{\dagger}: M\_{2^{n}}(\mathbb{C}) \to \mathcal{M}\_{2^{n}}(\mathbb{C}) \\ \mathsf{new}\_{\rho}^{\dagger}: A \mapsto S^{\dagger}AS \\ \mathsf{new}\_{\rho}^{\dagger}: \{a\,\,d\} \mapsto \begin{pmatrix} a & 0 \\ 0 & d \end{pmatrix} \end{array} \right| \\ \left| \begin{array}{l} \mathsf{new}\_{\rho}^{\dagger}: M\_{2^{n}}(\mathbb{C}) \to A \Rightarrow S^{\dagger}AS \\ \mathsf{new}\_{\rho}^$$

Fig. 5: A selection of maps in the Schrödinger picture (f : A → B) and their Hermitian adjoints (f † : B → A) used in the Heisenberg picture.

Convex Sums. In both **C** and **W**<sup>∗</sup> NCPSU, morphisms are closed under convex sums, which are defined pointwise, as usual. More specifically, given NCPSUmaps f1,...,f<sup>n</sup> : A → B and real numbers p<sup>i</sup> ∈ [0, 1] with <sup>i</sup> p<sup>i</sup> ≤ 1, then the map <sup>i</sup> pif<sup>i</sup> : A → B is also an NCPSU-map.

Order-enrichment. For W\*-algebras A and B, we define a partial order on **C**(A, B) by : f ≤ g iff g − f is a completely positive map. Equipped with this order, our category **C** is **DCPO**⊥!-enriched [3, Theorem 4.3]. The least element in **C**(A, B) is also a zero morphism and is given by the map **0** : A → B, defined by **0**(x)=0. Also, the coproduct structure and the symmetric monoidal structure are both **DCPO**⊥!-enriched [2, Corollary 4.9.15] [3, Theorem 4.5].

Quantum Operations. For convenience, our operational semantics adopts the Schrödinger picture of quantum mechanics, which is the picture most experts in quantum computing are familiar with. However, as we have just explained, our denotational semantics has to adopt the Heisenberg picture. The two pictures are equivalent in finite dimensions and we will now show how to translate from one to the other. By doing so, we provide an explicit description (in both pictures) of the required quantum maps that we need to interpret QPL.

Consider the maps in Figure 5. The map tr is used to trace out (or discard) parts of quantum states. Density matrices ρ are in 1-1 correspondence with the maps newρ, which we use in our semantics to describe (mixed) quantum states. The meas map simply measures a qubit in the computational basis and returns a bit as measurement outcome. The unitary<sup>S</sup> map is used for application of a unitary S. These maps work as described in the Schrödinger picture of quantum mechanics, i.e., the category **W**<sup>∗</sup> NCPSU. For every map f : A → B among those mentioned, <sup>f</sup> † : <sup>B</sup> <sup>→</sup> <sup>A</sup> indicates its Hermitian adjoint <sup>3</sup>. In the Heisenberg picture, composition of maps is done in the opposite way, so we simply write <sup>f</sup> ‡ := (<sup>f</sup> †)op <sup>∈</sup> **<sup>C</sup>**(A, B) for the Hermitian adjoint of <sup>f</sup> when seen as a morphism in (**W**<sup>∗</sup> NCPSU)op <sup>=</sup> **<sup>C</sup>**. Thus, the mapping (−)‡ translates the above operations from the Schrödinger picture (the category **W**<sup>∗</sup> NCPSU) to the Heisenberg picture (the category **C**) of quantum mechanics.

<sup>3</sup> This adjoint exists, because A and B are finite-dimensional W\*-algebras which therefore have the structure of a Hilbert space when equipped with the Hilbert-Schmidt inner product [27, pp. 145].

Parameterised Initial Algebras. In order to interpret inductive datatypes, we need to be able to compute parameterised initial algebras for the functors induced by our type expressions. **V** is ideal for this, because it is cocomplete and monoidal closed and so all type expressions induce functors on **V** which preserve ω-colimits.

Definition 11 (cf. [6, §6.1]). Given a category **<sup>A</sup>** and a functor <sup>T</sup> : **<sup>A</sup>**<sup>n</sup> <sup>→</sup> **<sup>A</sup>**, with <sup>n</sup> <sup>≥</sup> <sup>1</sup>, a parameterised initial algebra for <sup>T</sup> is a pair (T, φ<sup>T</sup> ), such that:

– <sup>T</sup> : **<sup>A</sup>**n−<sup>1</sup> <sup>→</sup> **<sup>A</sup>** is a functor;

– <sup>φ</sup><sup>T</sup> : <sup>T</sup> ◦ Id, T<sup>⇒</sup> <sup>T</sup> : **<sup>A</sup>**<sup>n</sup>−<sup>1</sup> <sup>→</sup> **<sup>A</sup>** is a natural isomorphism;

– For every <sup>A</sup> <sup>∈</sup> Ob(**A**<sup>n</sup>−1), the pair (TA, φ<sup>T</sup> <sup>A</sup>) is an initial T(A, −)-algebra.

Proposition 12. Every <sup>ω</sup>-cocontinuous functor <sup>T</sup> : **<sup>V</sup>**<sup>n</sup> <sup>→</sup> **<sup>V</sup>** has a parameterised initial algebra (T, φ<sup>T</sup> ) with <sup>T</sup> : **<sup>V</sup>**<sup>n</sup>−<sup>1</sup> <sup>→</sup> **<sup>V</sup>** being <sup>ω</sup>-cocontinuous.

Proof. **V** is cocomplete, so this follows from [13, §4.3].

## 5 Denotational Semantics of QPL

In this section we describe the denotational semantics of QPL.

## 5.1 Interpretation of Types

The interpretation of a type <sup>Θ</sup> <sup>A</sup> is a functor -<sup>Θ</sup> <sup>A</sup> : **<sup>V</sup>**|Θ<sup>|</sup> <sup>→</sup> **<sup>V</sup>**, defined by induction on the derivation of Θ A in Figure 6. As usual, one has to prove this assignment is well-defined by showing the required initial algebras exist.

Proposition 13. The assignment in Figure 6 is well-defined.

Proof. By induction, every -<sup>Θ</sup> <sup>A</sup> is an <sup>ω</sup>-cocontinuous functor and thus it has a parameterised initial algebra by Proposition 12.

Lemma 14 (Type Substitution). Given types Θ, X A and Θ B, then:

$$\{\Theta \vdash A[B/X]\} = [\Theta, X \vdash A] \diamond \langle Id, [\Theta \vdash B] \rangle.$$

Proof. Straightforward induction.

For simplicity, the interpretation of terms is only defined on closed types and so we introduce more concise notation for them. For any closed type · A we write for convenience -<sup>A</sup> := -· <sup>A</sup>(∗) <sup>∈</sup> Ob(**V**), where <sup>∗</sup> is the unique object of the terminal category <sup>1</sup>. Notice also that -<sup>A</sup> <sup>∈</sup> Ob(**C**) = Ob(**V**).

Definition 15. Given a closed type · μX.A, we define an isomorphism (in **V**):

$$\text{fold}\_{\mu X.A} : \left\lbrack \left[ A[\mu X.A/X] \right] \right\rbrack = \left\lbrack \left[ X \vdash A \right] \right\rbrack \left\lbrack \mu X.A \right\rbrack \cong \left\lbrack \mu X.A \right\rbrack : \text{unfold}\_{\mu X.A}$$

where the equality is Lemma 14 and the iso is the initial algebra structure.

Example 16. The interpretation of the types from Example 1 are -**Nat** <sup>=</sup> <sup>ω</sup> <sup>i</sup>=0 <sup>C</sup> and -**List**(A) <sup>=</sup> <sup>ω</sup> <sup>i</sup>=0 -A ⊗i . Specifically, -**List**(**qbit**) <sup>=</sup> <sup>ω</sup> <sup>i</sup>=0 <sup>C</sup>2i×2<sup>i</sup> .

$$\begin{aligned} [\Theta \vdash A] &: \mathbf{V}^{[\Theta]} \to \mathbf{V} \\ [\Theta \vdash \Theta\_i] &= H\_i \\ [\Theta \vdash I] &= K\_I \\ [\Theta \vdash \mathbf{q} \mathbf{bit}] &= K\_{\mathbf{q} \mathbf{bit}} \\ [\Theta \vdash A + B] &= + \diamond \langle [\Theta \vdash A], [\Theta \vdash B] \rangle \\ [\Theta \vdash A \otimes B] &= \otimes \diamond \langle [\Theta \vdash A], [\Theta \vdash B] \rangle \\ [\Theta \vdash \mu X.A] &= [\Theta, X \vdash A]^\sharp \end{aligned}$$

Fig. 6: Interpretations of types. K<sup>A</sup> is the constant-A-functor.


$$\begin{aligned} [\cdot \vdash \* : I] &:= \text{id}\_I \\ [\![n] \vdash n : \mathbf{q} \mathbf{bit}] &:= \text{id}\_{\mathbf{q} \text{bit}} \\ [Q \vdash \mathbf{left}\_{A,B} v : A + B] &:= \text{left} \circ [v] \\ [Q \vdash \mathbf{right}\_{A,B} v : A + B] &:= \text{right} \circ [v] \\ [Q\_1, Q\_2 \vdash (v, w) : A \otimes B] &:= [v] \otimes [w] \\ [Q \vdash \mathbf{fold}\_{\mu X.A} v : \mu X.A] &:= \text{fold} \diamond [v] \end{aligned}$$

Fig. 7: Interpretation of values.


Fig. 8: Interpretation of QPL terms.

## 5.2 Copying and Discarding

Our type system is affine, so we have to construct discarding maps at all types. The tensor unit I is a terminal object in **V** (but not in **C**) which leads us to the next definition.

Definition 17 (Discarding map). For any W\*-algebra A, let <sup>A</sup> : A → I be the unique morphism of **V** with the indicated domain and codomain.

We will see that all values admit an interpretation as **V**-morphisms and are therefore discardable. In physical terms, this means values are causal (in the sense mentioned in the introduction). Of course, this is not true for the interpretation of general terms (which correspond to **C**-morphisms).

Our language is equipped with a copy operation on classical data, so we have to explain how to copy classical values. We do this by constructing a copy map defined at all classical types using results from [13,14].

Proposition 18. Using the categorical data of **Set V** F G , one can

define a copy map -<sup>P</sup> : -<sup>P</sup> <sup>→</sup> -<sup>P</sup> <sup>⊗</sup> -<sup>P</sup> for every classical type · <sup>P</sup>, such that the triple -<sup>P</sup>, -<sup>P</sup> , -P forms a cocommutative comonoid in **V**.

We shall later see that the interpretations of our classical values are comonoid homomorphisms (w.r.t. Proposition 18) and therefore they may be copied.

## 5.3 Interpretation of Terms

Given a variable context Γ = x<sup>1</sup> : A1,...,x<sup>n</sup> : An, we interpet it as the object -<sup>Γ</sup> := -<sup>A</sup><sup>1</sup> ⊗···⊗ -<sup>A</sup><sup>n</sup> <sup>∈</sup> Ob(**C**). The interpretation of a procedure context Π = f<sup>1</sup> : A<sup>1</sup> → B1,...,f<sup>n</sup> : A<sup>n</sup> → B<sup>n</sup> is defined to be the pointed dcpo -<sup>Π</sup> := **<sup>C</sup>**(A1, B1) ×···× **<sup>C</sup>**(An, Bn). A term <sup>Π</sup> <sup>Γ</sup> <sup>M</sup> <sup>Σ</sup> is interpreted as a Scott-continuous function -<sup>Π</sup> <sup>Γ</sup> <sup>M</sup> <sup>Σ</sup> : -<sup>Π</sup> <sup>→</sup> **<sup>C</sup>**(-<sup>Γ</sup>, -<sup>Σ</sup>) defined by induction on the derivation of Π Γ M Σ in Figure 8. For brevity, we often write -<sup>M</sup> := -<sup>Π</sup> <sup>Γ</sup> <sup>M</sup> <sup>Σ</sup> , when the contexts are clear or unimportant.

We now explain some of the notation used in Figure 8. The rules for manipulating qubits use the morphisms new‡ |0 0| , meas‡ and unitary‡ <sup>S</sup> which are defined in §4. For the interpretation of while loops, given an arbitrary morphism f : A ⊗ bit → A ⊗ bit of **C**, we define a Scott-continuous endofunction

$$\begin{aligned} W\_f: \mathbf{C} \left( A \otimes \mathbf{bit}, A \otimes \mathbf{bit} \right) &\to \mathbf{C} (A \otimes \mathbf{bit}, A \otimes \mathbf{bit})\\ W\_f(g) &= \left[ \mathrm{id} \otimes \mathrm{left}\_{I,I}, \, g \circ f \circ (\mathrm{id} \otimes \mathrm{right}\_{I,I}) \right] \circ d\_{A,I,I}, \end{aligned}$$

where the isomorphism dA,I,I : A ⊗ (I + I) → (A ⊗ I)+(A ⊗ I) is explained in §4. For any pointed dcpo D and Scott-continuous function h : D → D, its least fixpoint is lfp(h) := <sup>∞</sup> <sup>i</sup>=0 h<sup>i</sup> (⊥), where ⊥ is the least element of D.

Remark 19. The term semantics for defining and calling procedures does not involve any fixpoint computations. The required fixpoint computations are done when interpreting procedure stores, as we shall see next.

#### 5.4 Interpretation of Configurations

Before we may interpret program configurations, we first have to describe how to interpret values and procedure stores.

Interpretation of Values. A qubit pointer context Q is interpreted as the object -<sup>Q</sup> <sup>=</sup> qbit⊗|Q<sup>|</sup> . A value Q v : A is interpreted as a morphism in **V** -<sup>Q</sup> <sup>v</sup> : <sup>A</sup> : -<sup>Q</sup> −→ -<sup>A</sup>, which we abbreviate as <sup>v</sup> if <sup>Q</sup> and <sup>A</sup> are clear from context. It is defined by induction on the derivation of Q v : A in Figure 7.

For the next theorem, recall that if Q v : A is a classical value, then Q = ·.

Theorem 20. Let Q v : A be a value. Then:

1. <sup>v</sup> is discardable (i.e. causal). More specifically, -<sup>A</sup> ◦ <sup>v</sup> <sup>=</sup> -<sup>Q</sup> = tr‡. 2. If <sup>A</sup> is classical, then <sup>v</sup> is copyable, i.e., -<sup>A</sup> ◦ <sup>v</sup> = (<sup>v</sup> <sup>⊗</sup> <sup>v</sup>) ◦ <sup>I</sup> .

We see that, as promised, interpretations of values may always be discarded and interpretations of classical values may also be copied. Next, we explain how to interpret value contexts. For a value context Q; Γ V , its interpretation is the morphism:

$$\left[\left[Q;\varGamma\vdash V\right]\right] = \left(\left[Q\right]\stackrel{\cong}{\to}\left[Q\_1\right]\otimes\cdots\otimes\left[Q\_n\right]\stackrel{\left[v\_1\right]\otimes\cdots\otimes\left[v\_n\right]}{\to}\left[\left[\varGamma\right]\right],$$

where <sup>Q</sup><sup>i</sup> <sup>v</sup><sup>i</sup> : <sup>A</sup><sup>i</sup> is the splitting of <sup>Q</sup> (see §3) and -<sup>Γ</sup> <sup>=</sup> -<sup>A</sup><sup>1</sup> ⊗···⊗ -<sup>A</sup><sup>n</sup>. Some of the Q<sup>i</sup> can be empty and this is the reason why the definition depends on a coherent natural isomorphism. We write -<sup>V</sup> as a shorthand for -<sup>Q</sup>; <sup>Γ</sup> <sup>V</sup> . Obviously, -<sup>V</sup> is also causal thanks to Theorem 20.

Interpretation of Procedure Stores. The interpretation of a well-formed procedure store <sup>Π</sup> <sup>Ω</sup> is an element of -<sup>Π</sup>, i.e. a <sup>|</sup>Π|-tuple of morphisms from **<sup>C</sup>**. It is defined by induction on Π Ω :

$$\begin{aligned} [\cdot \vdash \cdot] = () \\ \{\amalg, f: A \to B \vdash \Omega, f: :: x: A \to y: B \nmid \} \end{aligned} $$

Interpretation of Configurations. Density matrices <sup>ρ</sup> <sup>∈</sup> <sup>M</sup>2<sup>n</sup> (C) are in 1-1 correspondence with **W**<sup>∗</sup> NCPSU-morphisms new<sup>ρ</sup> : <sup>C</sup> <sup>→</sup> <sup>M</sup>2<sup>n</sup> (C) which are in turn in 1-1 correspondence with **C**-morphisms new‡ <sup>ρ</sup> : <sup>I</sup> <sup>→</sup> qbit⊗<sup>n</sup>. Using this observation, we can now define the interpretation of a configuration C = (M | V | Ω | ρ) with Π; Γ; Σ; Q (M | V | Ω | ρ) to be the morphism


For brevity, we simply write -(<sup>M</sup> <sup>|</sup> <sup>V</sup> <sup>|</sup> <sup>Ω</sup> <sup>|</sup> <sup>ρ</sup>) or even just -<sup>C</sup> to refer to the above morphism.

#### 5.5 Soundness, Adequacy and Big-step Invariance

Since our operational semantics allows for branching, soundness is showing that the interpretation of configurations is equal to the sum of small-step reducts.

Theorem 21 (Soundness). For any non-terminal configuration C :

$$\mathbb{[\mathcal{C}]} = \sum\_{\mathcal{C} \leadsto \mathcal{D}} \left[ \mathcal{D} \right].$$

Proof. By induction on the shape of the term component of C.

Remark 22. The above sum and all sums that follow are well-defined convex sums of NCPSU-maps where the probability weights p<sup>i</sup> have been encoded in the density matrices.

A natural question to ask is whether -<sup>C</sup> is also equal to the (potentially infinite) sum of all terminal configurations that C reduces to. In other words, is the interpretation of configurations also invariant with respect to big-step reduction. This is indeed the case and proving this requires considerable effort.

Theorem 23 (Big-step Invariance). For any configuration C, we have:

$$\left[\mathcal{C}\right] = \bigvee\_{n=0}^{\infty} \sum\_{r \in \text{TerSeg}\_{\leq n}(\mathcal{C})} \left[\text{End}(r)\right],$$

The above theorem is the main result of our paper. This is a powerful result, because with big-step invariance in place, computational adequacy<sup>4</sup> at all types is now a simple consequence of the causal properties of our interpretation. Observe that for any configuration <sup>C</sup>, we have a subunital map ◦ -<sup>C</sup> : <sup>C</sup> <sup>→</sup> <sup>C</sup> and evaluating it at 1 yields a real number ( ◦ -<sup>C</sup>) (1) <sup>∈</sup> [0, 1].

#### Theorem 24 (Adequacy). For any normalised <sup>C</sup> : ( ◦ -<sup>C</sup>) (1) = Halt(C).

If C is not normalised, then adequacy can be recovered simply by normalising: ( ◦ -<sup>C</sup>) (1) = tr(C)Halt(C), for any possible configuration <sup>C</sup>. The adequacy formulation of [17] and [5] is now a special case of our more general formulation.

Corollary 25. Let M be a closed program of unit type, i.e. · · M · . Then:

$$\left\| \left( M \mid \cdot \mid \cdot \mid \mid 1 \right) \right\| \left( 1 \right) = \text{Halt}(M \mid \cdot \mid \mid \cdot \mid \mid 1).$$

Proof. By Theorem 24 and because <sup>I</sup> = id.

<sup>4</sup> Recall that a computational adequacy result has to establish an equivalent purely denotational characterisation of the operational notion of non-termination.

## 6 Conclusion and Related Work

There are many quantum programming languages described in the literature. For a survey see [7] and [16, pp. 129]. Some circuit programming languages (e.g. Proto-Quipper [21,22,15]), generate quantum circuits, but do not necessarily support executing quantum measurements. Here we focus on quantum languages which support measurement and which have either inductive datatypes or some computational adequacy result.

Our work is the first to present a detailed semantic treatment of user-defined inductive datatypes for quantum programming. In [17] and [5], the authors show how to interpret a quantum lambda calculus extended with a datatype for lists, but their syntax does not support any other inductive datatypes. These languages are equipped with lambda abstractions, whereas our language has only support for procedures. Lambda abstractions are modelled using constructions from quantitative semantics of linear logic in [17] and techniques from game semantics in [5]. We believe our model is simpler and certainly more physically natural, because we work only with mathematical structures used by physicists in their study of quantum mechanics. Both [17] and [5] prove an adequacy result for programs of unit type. In [20], the authors discuss potential categorical models for inductive datatypes in quantum programming, but there is no detailed semantic treatment provided and there is no adequacy result, because the language lacks recursion.

Other quantum programming languages without inductive datatypes, but which prove computational adequacy results include [9,12]. A model based on W\*-algebras for a quantum lambda calculus without recursion or inductive datatypes was described in a recent manuscript [4]. In that model, it appears that currying is not a Scott-continuous operation, and if so, the addition of recursion renders the model neither sound, nor adequate. For this reason, we use procedures and not lambda abstractions in our language.

To conclude, we presented two novel results in quantum programming: (1) we provided a denotational semantics for a quantum programming language with inductive datatypes; (2) we proved that our denotational semantics is invariant with respect to big-step reduction. We also showed that the latter result is quite powerful by demonstrating how it immediately implies computational adequacy.

Our denotational model is based on W\*-algebras, which are used by physicists to study quantum foundations. We hope this would make it useful for developing static analysis methods (based on abstract interpretation) that can be used for entanglement detection [18] and we plan on investigating this in future work.

Acknowledgements. We thank Andre Kornell, Bert Lindenhovius and Michael Mislove for discussions regarding this paper. We also thank the anonymous referees for their feedback. MR acknowledges financial support from the Quantum Software Consortium, under the Gravitation programme of the Dutch Research Council NWO. The remaining authors were supported by the French projects ANR-17-CE25-0009 SoftQPro, ANR-17-CE24-0035 VanQuTe and PIA-GDN/Quantex.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Spinal Atomic Lambda-Calculus**

David Sherratt<sup>1</sup> (-), Willem Heijltjes2, Tom Gundersen3, and Michel Parigot<sup>4</sup>

<sup>1</sup> Friedrich-Schiller-Universit¨at Jena, Germany. david.rhys.sherratt@uni-jena.de <sup>2</sup> University of Bath, United Kingdom. w.b.heijltjes@bath.ac.uk <sup>3</sup> Red Hat, Inc. Norway. teg@jklm.no <sup>4</sup> Institut de Recherche en Informatique Fondamentale, CNRS, Universit´e de Paris. France.

parigot@irif.fr

**Abstract.** We present the spinal atomic λ-calculus, a typed λ-calculus with explicit sharing and atomic duplication that achieves spinal full laziness: duplicating only the direct paths between a binder and bound variables is enough for beta reduction to proceed. We show this calculus is the result of a Curry–Howard style interpretation of a deep-inference proof system, and prove that it has natural properties with respect to the λ-calculus: confluence and preservation of strong normalisation.

**Keywords:** Lambda-Calculus · Full laziness · Deep inference · Curry– Howard

## **1 Introduction**

In the λ-calculus, a main source of efficiency is *sharing*: multiple use of a single subterm, commonly expressed through graph reduction [27] or explicit substitution [1]. This work, and the *atomic* λ*-calculus* [16] on which it builds, is an investigation into sharing as it occurs naturally in intuitionistic *deep-inference* proof theory [26]. The atomic λ-calculus arose as a Curry–Howard interpretation of a deep-inference proof system, in particular of the *distribution* rule given below left, a variant of the characteristic *medial* rule [10, 26]. In the term calculus, the corresponding *distributor* enables duplication to proceed *atomically*, on individual constructors, in the style of sharing graphs [21]. As a consequence, the natural reduction strategy in the atomic λ-calculus is *fully lazy* [27, 4]: it duplicates only the minimal part of a term, the *skeleton*, that can be obtained by lifting out subterms as explicit substitutions. (While duplication is atomic *locally*, a duplicated abstraction does not form a redex until also its bound variables have been duplicated; hence duplication becomes fully lazy *globally*.)

c The Author(s) 2020

This work was supported by EPSRC Project EP/R029121/1 *Typed Lambda-Calculi with Sharing and Unsharing* and ANR project 15-CE25-0014 *The Fine Structure of Formal Proof Systems and their Computational Interpretations (FISP)*

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 582–601, 2020. https://doi.org/10.1007/978-3-030-45231-5\_30

$$\text{Distribution:} \quad \frac{A \to \{B \land C\}}{\{A \to B\} \land \{A \to C\}} \, ^d \newline \text{Swidth:} \quad \frac{\{A \to B\} \land C}{A \to \{B \land C\}} \, ^d$$

We investigate the computational interpretation of another characteristic deep-inference proof rule: the *switch* rule above right [26].<sup>5</sup> Our result is the *spinal atomic* λ*-calculus*, a λ-calculus with a refined form of full laziness, *spine duplication*. In the terminology of [4], this strategy duplicates only the *spine* of an abstraction: the paths to its bound variables in the syntax tree of the term.<sup>6</sup>

We illustrate these notions in Figure 1, for the example λx.λy.((λz.z)y)x. The *scope* of the abstraction λx is the entire subterm, λy.((λz.z)y)x (which may or may not be taken to include λx itself). Note that with explicit substitution, the scope may grow or shrink by lifting explicit substitutions in or out. The *skeleton* is the term λx.λy.(wy)x where the subterm λz.z is lifted out as an (explicit) substitution [λz.z/w]. The *spine* of a term, indicated in the second image, cannot naturally be expressed with explicit substitution, though one can get an impression with *capturing* substitutions: it would be λx.λy.wx, with the subterm (λz.z)y extracted by a capturing substitution [(λz.z)y/w]. Observe that the skeleton can be described as the *iterated spine*: it is the smallest subgraph of the syntax tree closed under taking the spine of each abstraction, i.e. that contains the spine of every abstraction it contains.

These notions give rise to four natural duplication regimes. For a shared abstraction to become available as the function in a β-redex: *laziness* duplicates its *scope* [22]; *Full laziness* duplicates its *skeleton* [27]; *Spinal full laziness* duplicates its *spine* [8]; *optimal reduction* duplicates only the abstraction λx and its bound variables x [21, 3].<sup>7</sup>

While each of these duplication strategies has been expressed in graphs and labelled calculi, the atomic λ-calculus is the first term calculus with Curry– Howard corresponding proof system to naturally describe full laziness. Likewise, the spinal atomic λ-calculus presented here is the first term calculus with Curry– Howard corresponding proof system to naturally describe spinal full laziness.

*Switch and Spine.* One way to describe the skeleton or the spine of an abstraction within a λ-term is through explicit end-of-scope markers, as explored by Berkling and Fehr [7], and more recently by Hendriks and Van Oostrom [18]. We use their *adbmal* ( λ) to illustrate the idea: the constructor λx.N indicates that the subterm N does not contain occurrences of x (or that any that do occur are

<sup>5</sup> The switch rule is an intuitionistic variant of *weak* or *linear distributivity* [12] for multiplicative linear logic.

<sup>6</sup> There is a clash of (existing) terminology: the *spine of an abstraction*, as we use here, is a different notion from the *spine of a* λ*-term*, which is the path from the root to the leftmost variable, as used e.g. in head reduction and abstract machines.

<sup>7</sup> Interestingly, Balabonski [5] shows that for *weak* reduction (where one does not reduce under an abstraction) full laziness and spinal full laziness are both optimal (in the number of beta-steps required to reach a normal form).

Fig. 1: Balanced and unbalanced typing derivations for λx.λy.((λz.z)y)x, with corresponding graphical representations of the term. The variable x has type A and y,z type A → B, shortened to BA. The left derivation isolates the skeleton of λx, and the right derivation its spine, both by the subderivations in braces.

not available to a binder λx outside λx.N). The scope of an abstraction thus becomes explicitly indicated in the term. This opens up a distinction between *balanced* and *unbalanced* scopes: whether scopes must be properly nested, or not; for example, in λx.λy.N, a subterm λy. λx.M is balanced, but λx. λy.M is not. With balanced scope, one can indicate the skeleton of an abstraction; with unbalanced scope (which Hendriks and Van Oostrom dismiss) one can indicate the spine. We do so for our example term λx.λy.((λz.z)y)x below.


A closely related approach is *director strings*, introduced by Kennaway and Sleep [19] for combinator reduction and generalized to any reduction strategy by Fern´andez, Mackie, and Sinot in [13]. The idea is to use nameless abstractions identified by their nesting (as with De Bruijn indices), and make the paths to bound variables explicit by annotating each constructor with a string of *directors*, that outline the paths. The primary aim of these approaches is to eliminate αconversion and to streamline substitution. Consequently, while they can *identify* the spine, they do not readily isolate it for duplication.

The present work starts from our observation that the *switch* rule of open deduction functions as a proof-theoretic end-of-scope construction (see [25] for details). However, it does so in a *structural* way: it forces a deconstruction of a proof into readily duplicable parts, which together may form the spine of an abstraction. The derivations in Figure 1 demonstrate this, as we will now explain—see the next section for how they are formally constructed.

The abstraction λx corresponds in the proof system to the implication A→, explicitly scoping over its right-hand side. On the left, with the *abstraction* rule (λ), scopes must be balanced, and the proof system may identify the *skeleton*; here, that of λx as the largest blue box. Decomposing the abstraction (λ) into *axiom* (a) and *switch* (s), on the right the proof system may express unbalanced scope. It does so by separating the scope of an abstraction into multiple parts; here, that of λx is captured as the two top-level red boxes. Each box is ready to be duplicated; in this way, one may duplicate the spine of an abstraction only.

These two derivations correspond to terms in our calculus. The subterms not part of the skeleton (i.e. λz.z) remain shared and we are able to duplicate the skeleton alone. This is also possible in [16]. In our calculus we are also able to duplicate just the spine by using a *distributor*. We require this construct as otherwise we break the binding of the y-abstraction. The distributor manages and maintains these bindings. The y-abstraction in the spine (y⟨a⟩) is a *phantomabstraction*, because it is not real and we cannot perform β-reduction on it. However, it may become real during reduction. It can be seen as a placeholder for the abstraction. The variables in the *cover* (a) represent subterms that both remain shared and are found in the distributor.

$$\begin{array}{ll}\text{Skeleton:} & \underline{\lambda x.\lambda y.\{a\,y\}\,x\left[a\leftarrow\lambda z.z\right]}\\\text{Spine:} & \underline{\lambda x.\,y\{a\}.\{a\}\,x\left[y\{a\}\mid\lambda y.\left[a\leftarrow\{\lambda z.z\}y\right]\right]}\end{array}$$

Our investigation is then focused on the interaction of switch and distribution (later observed in the rewrite rule l5). The use of the distribution rule allows us to perform duplication atomically, and thus provides a natural strategy for spinal full laziness. In Figure 1 on the right, this means duplicating the two top-level red boxes can be done independently from duplicating the yellow box.

## **2 Typing a** *λ***-calculus in open deduction**

We work in *open deduction* [15], a formalism of deep-inference proof theory, using the following proof system for (conjunction–implication) intuitionistic logic. A *derivation* from a *premise* formula X to a *conclusion* formula Z is constructed inductively as in Figure 2a, with from left to right: a propositional atom a, where X = Z = a; *horizontal composition* with a connective →, where X = Y → X<sup>2</sup> and Z = Y → Z2; *horizontal composition* with a connective ∧, where X = X<sup>1</sup> ∧ X<sup>2</sup> and Z = Z<sup>1</sup> ∧ Z2; and *rule composition*, where r is an inference rule (Figure 2b) from Y<sup>1</sup> to Y2. The boxes serve as parentheses (since derivations extend in two dimensions) and may be omitted. Derivations are considered up to associativity of rule composition. One may consider formulas as derivations that omit rule composition. We work modulo associativity, symmetry, and unitality of conjunction, justifying the n-ary contraction, and may omit ⊺ from the axiom rule. A 0-ary contraction, with conclusion ⊺, is a *weakening*. Figure 2b: the abstraction rule (λ) is derived from axiom and switch. *Vertical composition* of a derivation from X to Y and one from Y to Z, depicted by a dashed line, is a defined operation, given in Figure 2c, where ∗ ∈ {∧, →}.

#### **2.1 The Sharing Calculus**

Our starting point is the *sharing calculus* (Λ<sup>S</sup>), a calculus with an explicit sharing construct, similar to explicit substitution.

(c) Vertical composition

Fig. 2: Intuitionistic proof system in open deduction

**Definition 1.** *The pre-terms* r, s, t, u *and sharings* [Γ] *of the* Λ<sup>S</sup> *are defined by:*

s, t ∶∶= x ∣ λx.t ∣ s t ∣ t[Γ] [Γ] ∶∶= [x1,...,x<sup>n</sup> ← s]

*with from left to right: a variable; an abstraction, where* x *occurs free in* t *and becomes bound; an application, where* s *and* t *use distinct variable names; and a closure; in* t[x⃗ ← s] *the variables in the vector* x⃗ = x1,...,x<sup>n</sup> *all occur in* t *and become bound, and* s *and* t *use distinct variable names. Terms are pre-terms modulo permutation equivalence (*∼*):*

$$t[\vec{x} \gets s][\vec{y} \gets r] \sim t[\vec{y} \gets r][\vec{x} \gets s] \qquad \text{ (}\{\vec{y}\} \cap \{s\}\_{fv} = \{\}\text{)}$$

*A term is in sharing normal form if all sharings occur as* [x⃗ ← x] *either at the top level or directly under a binding abstraction, as* λx.t[x⃗ ← x]*.*

Note that variables are *linear* : variables occur at most once, and bound variables must occur. A vector x⃗ has length ∣ x⃗ ∣ and consist of the variables x1,...,x<sup>∣</sup> <sup>x</sup>⃗ <sup>∣</sup>. An **environment** is a sequence of sharings [Γ] = [Γ1]...[Γn]. Substitution is written {t/x}, and {t1/x1} ... {tn/xn} may be abbreviated to {ti/xi}i∈[n].

**Definition 2.** *The interpretation* -<sup>−</sup> <sup>∶</sup> <sup>Λ</sup> <sup>→</sup> <sup>Λ</sup><sup>S</sup> *is defined below.*

$$\begin{array}{lll} \end{array} \begin{bmatrix} x \end{bmatrix} = x \quad \begin{bmatrix} \end{bmatrix} \lambda x.t \begin{bmatrix} \\ \end{bmatrix} = \lambda x. \begin{bmatrix} t \end{bmatrix} \quad \begin{bmatrix} s \ t \end{bmatrix} = \begin{bmatrix} s \end{bmatrix} \begin{bmatrix} t \end{bmatrix} \quad \begin{bmatrix} t \end{bmatrix} \quad \begin{bmatrix} t \end{bmatrix} \vec{x} \leftarrow s \begin{bmatrix} \end{bmatrix} = \begin{bmatrix} t \end{bmatrix} \{\begin{bmatrix} s \end{bmatrix} \| \, x\_i \| \, i \epsilon \|n \| \} $$

The **translation** <sup>N</sup> of a <sup>λ</sup>-term <sup>N</sup> is the unique sharing-normal term <sup>t</sup> such that <sup>N</sup> <sup>=</sup> <sup>t</sup> . A term <sup>t</sup> will be typed by a derivation with restricted types, Basic Types: A, B, C <sup>∶</sup><sup>=</sup> <sup>a</sup> <sup>∣</sup> <sup>A</sup> <sup>→</sup> <sup>B</sup> Context Types: Γ, Δ, Ω <sup>∶</sup><sup>=</sup> <sup>A</sup> <sup>∣</sup> <sup>⊺</sup> <sup>∣</sup> <sup>Γ</sup> <sup>∧</sup> <sup>Δ</sup> x ∶ <sup>A</sup><sup>x</sup> t s ∶ Γ t <sup>A</sup> <sup>→</sup> <sup>B</sup> ∧ Δ s A @ B λx.t ∶ Γ <sup>λ</sup> <sup>A</sup> <sup>→</sup> Γ ∧ A<sup>x</sup> t B <sup>t</sup>[x⃗ <sup>←</sup> <sup>s</sup>] <sup>∶</sup> Γ ∧ Δ s A △ A ∧⋅⋅⋅∧ A <sup>Γ</sup> <sup>∧</sup> (<sup>A</sup> ∧⋅⋅⋅∧ <sup>A</sup>)<sup>x</sup>⃗ t B

Fig. 3: Typing system for Λ<sup>S</sup>

as shown below, where the *context type* Γ = A<sup>1</sup> ∧⋅⋅⋅∧A<sup>n</sup> will have an A<sup>i</sup> for each free variable x<sup>i</sup> of t. We connect free variables to their premises by writing A<sup>x</sup> and Γ <sup>x</sup>⃗. The Λ<sup>S</sup> is then typed as in Figure 3.

## **3 The Spinal Atomic** *λ***-Calculus**

We now formally introduce the syntax of the spinal atomic λ-calculus (Λ<sup>S</sup> <sup>a</sup> ), by extending the definition of the sharing calculus in Definition 1 with a *distributor* construct that allows for atomic duplication of terms.

**Definition 3 (Pre-Terms).** *The pre-terms* r, s, t*, closures* [Γ]*, and environments* [Γ] *of the* Λ<sup>S</sup> <sup>a</sup> *are defined by:*

$$\begin{array}{ccccccl} t & ::= & x & \mid & s & t & \mid & x\{\vec{y}\} . t & \mid & t\{I\} & \overline{\begin{array}{c} \boxed{I'} \end{array}} & ::= & \begin{array}{c} \boxed{I'} \end{array} & \mid & \overline{\begin{array}{c} \boxed{I'} \end{array}} \mid & \overline{\begin{array}{c} \boxed{I'} \end{array}} \end{array}$$

Our generalized abstraction x⟨ y⃗⟩.t is a **phantom-abstraction**, where x a **phantom-variable** and the **cover** y⃗ will be a subset of the free variables of t. It can be thought of as a "delayed" abstraction: x is a binder, but possibly not in t itself, and instead in the terms substituted for the variables y⃗; in other words, x is a *capturing* binder for substitution into y⃗. We define standard λabstraction as the special case λx.t ≡ x⟨ x ⟩.t, and generally, when we refer to x⟨ y⃗⟩ as a phantom-abstraction (rather than an abstraction) we assume y⃗ ≠ x. The **distributor** u[x⃗ ∣ y⟨ z⃗⟩[Γ]] binds the phantom-variables x⃗ in u, while its environment [Γ] will bind the variables in their covers; intuitively, it represents a set of explicit substitutions in which the variables x⃗ are expected to be captured.

The distributor is introduced when we wish to duplicate an abstraction, as depicted in Figure 4a. The sharing node (○) duplicates the abstraction node, creating a distributor (depiced as the sharing and unsharing node (●), together with the bindings of the phantom-variables (depicted with a dashed line). The variables captured by the environment are the variables connected to sharing nodes linked with a dotted line. Notice one sharing node can be linked with multiple unsharing nodes, and vice versa. Duplication of applications also duplicates

Fig. 4: Graphical illustration of the distributor

the dotted line (Figure 4b), but these can be removed later if the term does not contain the variable bound to the unsharing (Figure 4c). These subterms are those which are not part of the spine. Eventually, we will reach a state where the only sharing node connected to the unsharing node is the one that shared the variable bound to the unsharing, allowing us to eliminate the distributor (Figure 4d). The purpose of the dotted line is similar to the brackets of optimal reduction graphs [21, 24], to supervise which sharing and unsharing match.

Terms are then pre-terms with sensible and correct bindings. To define terms, we first define *free* and *bound* variables and phantom variables; variables are bound by abstractions (not phantoms) and by sharings, while phantom-variables are bound by distributors.

**Definition 4 (Free and Bound Variables).** *The free variables* (−)fv *and bound variables* (−)bv *of a pre-term* t *are defined as follows*

$$\begin{aligned} \{x\}\_{fv} &= \{x\} & \{x\}\_{bv} &= \{\} \\ \{s\}\_{fv} &= \{s\}\_{fv} \cup \{t\}\_{fv} & \{s\}\_{bv} &= \{s\}\_{bv} \cup \{t\}\_{bv} \\ \{x\{x\}.t\}\_{fv} &= \{t\}\_{fv} - \{x\} & \{x\{x\}.t\}\_{bv} &= \{t\}\_{bv} \cup \{x\} \\ \{x\{\bar{y}.t\}\_{fv} &= \{t\}\_{fv} & \{x\{\bar{y}\}.t\}\_{bv} &= \{t\}\_{bv} \\ \{u\{\bar{x}\leftarrow t\}\}\_{fv} &= \{u\}\_{fv} \cup \{t\}\_{fv} - \{\bar{x}\} & \{u\{\bar{x}\leftarrow t\}\}\_{bv} &= \{u\}\_{bv} \cup \{t\}\_{bv} \cup \{\bar{x}\} \\ \{u\{\bar{x}\mid y\{y\}\}\overline{\{\Gamma\}\}]\_{fv} &= \{u\{\overline{\Gamma\}\}\}\_{fv} - \{y\} & \{u\{\bar{x}\mid y\{y\}\}\overline{\{\Gamma\}\}\}\_{bv} &= \{u\{\overline{\Gamma\}\}\}\_{bv} \cup \{y\} \end{aligned}$$

$$(u\left[\vec{x}\,\middle|\,y\left\{\vec{z}\,\middle|\,\overline{\left[\Gamma\right]}\right\}\right])\_{fv} = (u\left[\overline{\left[\Gamma\right]}\right])\_{fv} \cup \{y\} \qquad \qquad (u\left[\vec{x}\,\middle|\,y\left\{\vec{z}\,\middle|\,\overline{\left[\Gamma\right]}\right]\right])\_{bv} = (u\left[\overline{\left[\Gamma\right]}\right])\_{bv}$$

$$(u[\vec{x}\,|\, y\langle \vec{z}\,\rangle \overline{[\,\Gamma]}\,])\_{bv} = (u\overline{[\,\Gamma]})\_{bv}$$

**Definition 5 (Free and Bound Phantom-Variables).** *The free phantomvariables* (−)fp *and bound phantom-variables* (−)bp *of the pre-term* t *are defined as follows*

$$\begin{aligned} \{x\}\_{fp} &= \{\} & \{x\}\_{bp} &= \{\} \\ \{s\}\_{fp} &= \{s\}\_{fp} \cup \{t\}\_{fp} & \{s\ t\}\_{bp} &= \{s\}\_{bp} \cup \{t\}\_{bp} \\ \{x\{x\},t\}\_{fp} &= \{t\}\_{fp} & \\ \{c(\vec{x}\,).t\}\_{fp} &= \{t\}\_{fp} \cup \{c\} & \{c(\vec{x}\,).t\}\_{bp} &= \{t\}\_{bp} \\ \{u[\vec{x}\,\leftarrow t]\}\_{fp} &= \{u\}\_{fp} \cup \{t\}\_{fp} & \{u[\vec{x}\,\leftarrow t]\}\_{bp} &= \{u\}\_{bp} \cup \{t\}\_{bp} \\ \{u[\vec{x}\,\bigleftarrow c(\vec{x}\,)\overline{[\Gamma]}]\}\_{fp} &= \{u\overline{[\Gamma']}\}\_{fp} - \{\vec{x}\,\} & \\ \{u[\vec{x}\,\bigleftarrow c(\vec{y}\,)\overline{[\Gamma']}]\}\_{fp} &= \{u\overline{[\Gamma']}\}\_{fp} \cup \{c\} - \{\vec{x}\,\} & \{u[\vec{x}\,\bigleftarrow c(\vec{y}\,)\overline{[\Gamma']}]\}\_{bp} &= \{u\overline{[\Gamma']}\}\_{bp} \cup \{\vec{x}\,\} \end{aligned}$$

The **free covers** (u)f c and **bound covers** (u)bc are the covers associated with the free phantom-variables (u)fp respectively the bound phantom-variables (u)bp of u; that is, if x occurs as x⟨ a⃗ ⟩ in u and x ∈ (u)fp then ⟨a⃗⟩ ∈ (u)f c. When bound, x and the variables in a⃗ may be alpha-converted independently. When a distributor u[x⃗ ∣ y⟨ z⃗⟩[Γ]] binds the phantom-variables x⃗ = x1,...,x<sup>n</sup> where each x<sup>i</sup> occurs as xi⟨ a⃗<sup>i</sup> ⟩ in u, then for technical convenience we may make the covers explicit in the distributor itself, and write

$$u[x\_1\langle \vec{a}\_1 \rangle \dots x\_n\langle \vec{a}\_n \rangle \lfloor y\langle \vec{z} \rangle \lceil \Gamma \rceil] \ . .$$

The environment [Γ] is expected to bind *exactly* the variables in the covers ⟨a⃗i⟩. We apply this and other restrictions to define the terms of the calculus.

**Definition 6.** *Terms* t ∈ Λ<sup>S</sup> <sup>a</sup> *are pre-terms with the following constraints*

	- *(b) the variables in* ⋃i≤n{a⃗i} *are free in* u *and bound by* [Γ]*.*
	- *(c) the variables in* {z⃗} *occur freely in the environment* [Γ]*.*

*Example 1.* Here we show some pre-terms that are not terms.

**–** c⟨ x ⟩.y (violates condition 2) **–** x y[x, z ← w] (violates condition 3) **–** e2⟨ w<sup>2</sup> ⟩.w<sup>2</sup> ((e1⟨ w<sup>1</sup> ⟩.w1) z)[e1⟨ w<sup>1</sup> ⟩, e2⟨ w<sup>2</sup> ⟩ ∣ c⟨ z ⟩ [w1, w<sup>2</sup> ← x⟨ x ⟩.x y]] (violates condition 4a)

We also work modulo permutation with respect to the variables in the cover of phantom-abstractions. Let x⃗ be a list of variables and let x⃗<sup>P</sup> be a permutation of that list, then the following terms are considered equal.

$$\begin{array}{c} \{c \{\vec{x}\} \} .t : \begin{array}{c} \{A \to I\} \land \Delta \\ \hline A^{c} \to \begin{bmatrix} \Gamma^{t} \land \Delta \\ \hline \end{bmatrix} . \end{array} \{ \begin{array}{c} \{C \to I\} \land \Delta \\ \hline C^{c} \to \begin{bmatrix} \Gamma^{t} \land \Delta \\ \hline \end{bmatrix} \end{array} \} . \begin{array}{c} \frac{(C \to I) \land \Delta}{\begin{bmatrix} C \to I \end{bmatrix} \land \Delta} . \\\\ u \left[ \begin{array}{c} \Sigma\_{1} \land \dots \land \Sigma\_{n} \end{array} \right] \land \Omega \\\ \begin{array}{c} \left(C^{c} \to \Sigma\_{1}^{c}\right) \land \dots \land \left(C^{c} \to \Sigma\_{n}^{c}\right) \end{array} \} \end{array} \} \land \Omega \} $$

Fig. 5: Typing derivations for phantom-abstractions and distributors

$$u[\vec{x} \gets t] \sim u[\vec{x\_P} \gets t] \qquad\qquad y\langle \vec{x} \rangle.t \sim y\langle \vec{x\_P} \rangle.t$$

Terms are typed with the typing system for Λ<sup>S</sup> extended with the *distribution* inference rule. This rule is the result of computationally interpreting the medial rule as done in [16]. We obtain this variant of the medial rule due to the restriction for implications and to avoid introducing disjunction to the typing system. The terms of Λ<sup>S</sup> <sup>a</sup> are then typed as in both Figure 3 and Figure 5. Note environments are typed by the derivations of all its closures composed horizontally with the conjunction connective. Also note that in the case for phantom-abstraction is similar for that of an abstraction, where we replace one occurrence of the simple type A by the conjunction Γ.

#### **3.1 Compilation and Readback.**

We now define the translations between Λ<sup>S</sup> <sup>a</sup> and the original λ-calculus. First we define the interpretation Λ → Λ<sup>S</sup> <sup>a</sup> (*compilation*). Intuitively, it replaces each abstraction λx.− with the term x⟨ x ⟩.−[x1,...,x<sup>n</sup> ← x] where x1,...,x<sup>n</sup> replace the occurrences of x. Actual substitutions are denoted as {t/x}. Let ∣M ∣<sup>x</sup> denote the number of occurrences of x in M, and if ∣M ∣<sup>x</sup> = n let M <sup>n</sup> <sup>x</sup> denote M with the occurrences of x replaced by fresh, distinct variables x1,...,xn. First, the translation of a *closed* term <sup>M</sup> is <sup>M</sup> ′ , defined below

**Definition 7 (Compilation).** *The interpretation of* <sup>λ</sup> *terms,* Λ′ <sup>∶</sup> <sup>Λ</sup> <sup>→</sup> <sup>Λ</sup><sup>S</sup> a *, is defined as*

$$\{M \frac{n\_1}{x\_1} \dots \frac{n\_k}{x\_k} \mathbb{J}'[x\_1^1, \dots, x\_1^{n\_1} \gets x\_1] \dots [x\_k^1, \dots, x\_k^{n\_k} \gets x\_k] \}$$

*where* x1,...,x<sup>k</sup> *are the free variables of* M *such that* ∣M ∣<sup>x</sup><sup>i</sup> = n<sup>i</sup> > 1 *and* −′ *is defined on terms as (where* <sup>n</sup> <sup>≠</sup> <sup>1</sup> *in the abstraction case):*

$$\begin{aligned} \{x\}' &= x\\ \{M \, N\}' &= \{M \, \bigvee' \{N\}' \quad \{\lambda x. M\}' = \begin{cases} x \{x\} . \{M\}' & \text{if } |M|\_x = 1\\ x \{x\} . \{M \frac{n}{x}\} \ulcorner [x\_1, \dots, x\_n \gets x] & \text{if } |M|\_x = n \end{cases} \end{aligned}$$

The readback into the λ-calculus is slightly more complicated, specifically due to the bindings induced by the distributor. Interpreting a distributor construct as a λ-term requires (1) converting the phantom-abstractions it binds in u into abstractions (2) collapsing the environment (3) maintaining the bindings between the converted abstractions and the intended variables located in the environment.

**Definition 8.** *Given a total function* σ *with domain* D *and codomain* C*, we overwrite the function with case* x ↦ v *where* x ∈ D *and* v ∈ C *such that*

$$\text{If } \sigma[x \mapsto v](z) \quad \text{:} = \quad \text{if } (x = z) \text{ then } v \text{ else } \sigma(z)$$

We use the map σ as part of the translation, the intuition is that for all bound variables x in the term we are translating, it should be that σ(x) = x. The purpose of the map γ is to keep track of the binding of phantom-variables.

**Definition 9.** *The interpretation* -<sup>−</sup><sup>∣</sup> <sup>−</sup> <sup>∣</sup>− <sup>∶</sup> <sup>Λ</sup><sup>S</sup> <sup>a</sup> × (V → Λ) × (V → V ) → Λ *is defined as*

 <sup>x</sup> <sup>∣</sup> <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> <sup>=</sup> <sup>σ</sup>(x) s t<sup>∣</sup> <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> <sup>=</sup> <sup>s</sup> <sup>∣</sup> <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> <sup>t</sup><sup>∣</sup> <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> <sup>c</sup>⟨ <sup>c</sup> ⟩.t<sup>∣</sup> <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> <sup>=</sup> λc. <sup>t</sup><sup>∣</sup> <sup>σ</sup>[<sup>c</sup> <sup>↦</sup> <sup>c</sup>] ∣ <sup>γ</sup> <sup>c</sup>⟨ <sup>x</sup>1,...,x<sup>n</sup> ⟩.t<sup>∣</sup> <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> <sup>=</sup> λc. <sup>t</sup><sup>∣</sup> <sup>σ</sup>[x<sup>i</sup> <sup>↦</sup> <sup>σ</sup>(xi){c/γ(c)}]i∈[n] <sup>∣</sup> <sup>γ</sup> <sup>u</sup>[x1,...,x<sup>n</sup> <sup>←</sup> <sup>t</sup>] ∣ <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> <sup>=</sup> <sup>u</sup> <sup>∣</sup> <sup>σ</sup>[x<sup>i</sup> <sup>↦</sup> <sup>t</sup><sup>∣</sup> <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> ]i∈[n] <sup>∣</sup> <sup>γ</sup> <sup>u</sup>[e1⟨ <sup>w</sup>⃗<sup>1</sup> ⟩,...,en⟨ <sup>w</sup>⃗<sup>n</sup> ⟩ ∣ <sup>c</sup>⟨ <sup>c</sup> ⟩[Γ]] ∣ <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> <sup>=</sup> <sup>u</sup>[Γ] ∣ <sup>σ</sup> <sup>∣</sup> <sup>γ</sup>[e<sup>i</sup> <sup>↦</sup> <sup>c</sup>]i∈[n] <sup>u</sup>[e1⟨ <sup>w</sup>⃗<sup>1</sup> ⟩,...,en⟨ <sup>w</sup>⃗<sup>n</sup> ⟩ ∣ <sup>c</sup>⟨ <sup>x</sup>1,...,x<sup>m</sup> ⟩[Γ]] ∣ <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> <sup>=</sup> <sup>u</sup>[Γ] ∣ <sup>σ</sup>′ <sup>∣</sup> <sup>γ</sup>[e<sup>i</sup> <sup>↦</sup> <sup>c</sup>]i∈[n] *where* σ′ = σ[x<sup>i</sup> ↦ σ(xi){c/γ(c)}]i∈[n]

The following Proposition justifies working modulo permutation equivalence. **Proposition 1.** *For* s, t ∈ Λ<sup>S</sup> <sup>a</sup> *, if* <sup>s</sup> <sup>∼</sup> <sup>t</sup> *then* <sup>s</sup> <sup>=</sup> <sup>t</sup> *.*

#### **3.2 Rewrite Rules.**

Both the spinal atomic λ-calculus and the atomic λ-calculus of [16] follow atomic reduction steps, i.e. they apply on individual constructors. The biggest difference is that our calculus is capable of duplicating not only the skeleton but also the spine. The rewrite rules in our calculus make use of 3 operations, *substitution*, *book-keeping*, and *exorcism*. The operation **substitution** t{s/x} propagates through the term t, and replaces the free occurences of the variable x with the term s. Moreover, if x occurs in the cover of a phantom-variable e⟨ y⃗ ⋅ x ⟩, then substitution replaces the x in the cover with (s)fv, resulting in e⟨ y⃗ ⋅ (s)fv ⟩. Although substitution performs some book-keeping on phantom-abstractions, we define an explicit notion of **book-keeping** {y⃗/e}<sup>b</sup> that updates the variables stored in a free cover i.e. for a term t, e⟨ x⃗ ⟩ ∈ (t)f c then e⟨ y⃗⟩ ∈ (t{y⃗/e}b)f c. The last operation we introduce is called **exorcism** {c⟨ x⃗ ⟩}e. We perform exorcisms on phantom-abstractions to convert them to abstractions. Intuitively, this will be performed on phantom-abstractions with phantom-variables bound to a distributor when said distributor is eliminated. It converts phantom-abstractions to abstractions by introducing a sharing of the phantom-variable that captures the variables in the cover, i.e. (c⟨ x⃗ ⟩.t){c⟨ x⃗ ⟩}<sup>e</sup> = c⟨ c ⟩.t[x⃗ ← c].

**Proposition 2.** *The translation* <sup>u</sup> <sup>∣</sup> <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> *commutes with substitutions, bookkeepings*1*, and exorcisms*<sup>2</sup> *in the following way*

$$\left[\left\lfloor u\{t/x\} \right\rfloor \left\lfloor \sigma \right\rfloor \gamma \right] = \left\lceil u \right\rceil \sigma \left[x \mapsto \left\lceil t \left\lfloor \sigma \right\rfloor \gamma \right] \right\rceil \left\lfloor \gamma \right\rfloor$$

$$\left[ \left\lfloor u \left\{ \vec{x}/c \right\} \right\rfloor \left\lfloor \sigma \right\rfloor \gamma \right] = \left\lceil u \right\rfloor \sigma \left\lfloor \gamma \right\rfloor$$

$$\left[ \left\lfloor u \left\{ c \left\{ x\_1, \ldots, x\_n \right\} \right\}\_{c} \left\lfloor \sigma \right\rfloor \gamma \right] = \left\lceil u \right\rfloor \sigma \left[x\_i \mapsto c \right\rceil\_{i \in \left[n\right]} \left\lfloor \gamma \right\rfloor$$
 $(1) \text{ Given } c \left\lfloor \vec{y} \right\rfloor \epsilon \text{ ( $u$ )}\_{fc} \text{ where } \vec{x} \subseteq \vec{y} \text{ and } for \, z \in \vec{y}/\vec{x}, \ \gamma \{c\} \notin \{\sigma(z) \}\_{fv}$  $(2) \text{ Given } c \left\lfloor \vec{x} \right\rfloor \epsilon \text{ ( $\vec{x}| \times \left\lfloor \vec{y} \right\rfloor \cap \{u\}\_{fv} = \left\lfloor \right\rfloor$ )$ 

*Proof. See [25], proof of Proposition 18, 19, 20, 21.*

Using these operations, we define the rewrite rules that allow for spinal duplication. Firstly we have beta reduction (↝β), which strictly requires an abstraction (not a phantom).

(x⟨ x ⟩.t) s ↝<sup>β</sup> t{s/x} Γ <sup>λ</sup> A → A<sup>x</sup> ∧ Γ t B ∧ Δ s A @ B ↝<sup>β</sup> Δ s A ∧ Γ A ∧ Γ t B (β)

Here β-reduction is a linear operation, since the bound variable x occurs exactly once in the body t. Any duplication of the term t in the atomic λ-calculus proceeds via the sharing reductions.

The first set of sharing reduction rules move closures towards the outside of a term. Most of these rewrite rules only change the typing derivations in the way that subderivations are composed, with the exception of moving a closure out of scope of a distributor.

$$s[I^{\Gamma}]t \leadsto\_{L} (s\,t)[I^{\Gamma}] \tag{l\_{1}}$$

$$s \, t[\varGamma] \sim\_L \{s \, t\}[\varGamma] \tag{l\_2}$$

$$d\{\vec{x}\}.t[\varGamma] \sim\_L \{d\{\vec{x}\}.t\}[\varGamma] \text{ if } \{\vec{x}\} \cap \{t\}\_{fv} = \{\vec{x}\} \tag{l\_3}$$

$$u[\vec{x} \gets t[\varGamma]] \sim\_L u[\vec{x} \gets t][\varGamma] \tag{l\_4}$$

For the case of lifting a closure outside a distributor, we use a notation ∥ [Γ] ∥ to identify the variables captured by a closure, i.e.∥ [x⃗ ← t] ∥= {x⃗} and

∥ [e1⟨ x⃗<sup>1</sup> ⟩,...,en⟨ x⃗<sup>x</sup> ⟩ ∣ c⟨ c ⟩[Γ]] ∥= {x⃗1,..., x⃗n}. Then let {z⃗} =∥ [Γ] ∥ in the following rewrite rule, where we remove z⃗ from the covers, that can only occur if {x⃗} ∩ ([Γ])fv = {}.

$$\begin{aligned} &u[e\_1\langle \vec{w\_1} \rangle \dots e\_n\langle \vec{w\_n} \rangle \, | \, c \{ \vec{x} \} \overline{[I']}[I']] \\ &\sim\_L u \{ \langle \vec{w\_i} \times \vec{z} \rangle \langle e\_i \rangle\_{b\_i \in \{n\}} [e\_1\langle \vec{w\_1} \times \vec{z} \rangle \dots e\_n\langle \vec{w\_n} \times \vec{z} \rangle \, | \, c \{ \vec{x} \} \overline{[I']}] [I'] \end{aligned} \tag{l\_5}$$

The graphical version of this rule is shown in Figure 4c, where we remove the edge only if there is no edge between t and the unsharing node. The proof rewrite rule corresponding with the rewrite rule l<sup>5</sup> can be broken down into two parts. The first part is readjusting how the derivations compose as shown below.

The second part of the rewrite rule justifies the need for the book-keeping operation. In the rewrite below, let A be the type of a variable z where z ∈ z⃗. After lifting, we want to remove the variable from the cover as to ensure correctness since the variables in the cover denote the variables captured by the environment. Book-keeping allows us to remove these variables simultaneously.

$$\begin{array}{c} \begin{array}{c} (C \rightarrow I) \wedge \Delta \wedge A \\ \hline \\ C \rightarrow \begin{array}{c} \Gamma \wedge \Delta \\ \hline \\ \Sigma\_{1} \wedge \cdots \wedge \Sigma\_{n} \end{array} \wedge \begin{array}{c} (C \rightarrow I) \wedge \Delta \\ \hline \\ C \rightarrow \begin{array}{c} \Gamma \wedge \Delta \\ \hline \\ \Sigma\_{1} \wedge \cdots \wedge \Sigma\_{i} \end{array} \wedge \begin{array}{c} \Gamma \wedge \Delta \\ \hline \\ \Sigma\_{1} \wedge \cdots \wedge \Sigma\_{n} \end{array} \wedge A \\ \hline \\ \begin{array}{c} \Sigma\_{1} \wedge \cdots \wedge \Sigma\_{i} \wedge A \wedge \cdots \wedge \Sigma\_{n} \end{array} \wedge A \\ \hline \\ \begin{array}{c} \Gamma \wedge \left(\begin{array}{c} (C \rightarrow \Sigma\_{i}) \wedge \cdots \wedge \\ \hline \\ \end{array} \rightarrow \begin{array}{c} (C \rightarrow \Sigma\_{i}) \wedge \Delta \\ \hline \\ \end{array} \rightarrow \begin{array}{c} (C \rightarrow \Sigma\_{i}) \wedge A \end{array} \end{array} \end{array} \wedge A \end{array}$$

The lifting rules (li) are justified by the need to lift closures out of the distributor, as opposed to duplicating them. The second set of rewrite rules, consecutive sharings are compounded and unary sharings are applied as substitutions. For simplicity, in the equivalent proof rewrite step we only show the binary case.

$$u[\vec{w}\leftarrow y][y\cdot\vec{y}\leftarrow t]\sim\_C u[\vec{w}\cdot\vec{y}\leftarrow t] \tag{c\_1}$$

$$u[x \gets t] \sim\_C u\{t/x\} \tag{c\_2}$$

$$\begin{array}{c c c c c} A & & & & \\ \hline A \wedge \left[ \begin{array}{c} A \\ \hline A \wedge A \end{array} \wedge & & & \\ \hline & & & A \wedge A \wedge A \end{array} \wedge & & & \begin{array}{c} A \\ \hline A \wedge A \wedge A \end{array} \leadsto{\sim}\_{C} A \end{array}$$

The atomic steps for duplicating are given in the third and final set of rewrite rules. The first being the atomic duplication step of an application, which is the same rule used in [16]. The binary case proof rewrite steps for each rule are also provided. There are also shown graphically in (respectively) Figure 4b (where we maintain links between sharings and unsharings), Figure 4a, and Figure 4d (where the unsharing node is linked to exactly one connecting sharing node).

$$u[x\_1 \dots x\_n \gets st] \sim\_D u[z\_1 \, y\_1 / x\_1] \dots \{z\_n \, y\_n / x\_n\} [z\_1 \dots z\_n \gets s] [y\_1 \dots y\_n \gets t] \tag{d\_1}$$

$$\frac{\begin{array}{c} (A \rightharpoonup B) \land A \end{array}}{\begin{array}{c} \frac{B}{B \land B} \land \end{array}} \circ \begin{array}{c} \frac{\begin{array}{c} (A \to B) \end{array}}{\begin{array}{c} (A \to B) \land \{A \to B\} \end{array}} \circ \wedge \frac{B}{B \land B} \circ \begin{array}{c} \frac{\begin{array}{c} (A \to B) \land \neg A \to B) \land \neg A \to B}{\neg} \xrightarrow{\neg A} \end{array} \circ \begin{array}{c} \frac{\begin{array}{c} (A \to B) \land \neg A \to B) \land \neg A \to B}{\neg} \xrightarrow{\neg A} \end{array} \circ \end{array}$$

u[x1,...,x<sup>n</sup> ← c⟨ y⃗⟩.t] ↝<sup>D</sup> <sup>u</sup>{ei⟨ <sup>w</sup><sup>i</sup> ⟩.wi/xi}i∈[n][e1⟨ <sup>w</sup><sup>1</sup> ⟩...en⟨ <sup>w</sup><sup>n</sup> ⟩ ∣ <sup>c</sup>⟨ <sup>y</sup>⃗⟩ [w1,...,w<sup>n</sup> <sup>←</sup> <sup>t</sup>]] (d2) (<sup>A</sup> <sup>→</sup> <sup>B</sup>) <sup>∧</sup> <sup>Γ</sup> <sup>s</sup> <sup>A</sup> <sup>→</sup> B ∧ Γ <sup>C</sup> △ (<sup>A</sup> <sup>→</sup> <sup>C</sup>) <sup>∧</sup> (<sup>A</sup> <sup>→</sup> <sup>C</sup>) ↝<sup>D</sup> (<sup>A</sup> <sup>→</sup> <sup>B</sup>) <sup>∧</sup> <sup>Γ</sup> <sup>s</sup> <sup>A</sup> <sup>→</sup> B ∧ Γ C △ <sup>C</sup> <sup>∧</sup> <sup>C</sup> <sup>d</sup> (<sup>A</sup> <sup>→</sup> <sup>C</sup>) <sup>∧</sup> (<sup>A</sup> <sup>→</sup> <sup>C</sup>)

$$u\left[e\_1\langle\vec{w\_1}\rangle\ldots e\_n\langle\vec{w\_n}\rangle\right]c\langle c\rangle\left[\vec{w\_1},\ldots,\vec{w\_n}\leftarrow c\right]\rangle\sim\_D u\left\{e\_1\langle\vec{w\_1}\rangle\right\}\_e\ldots\left\{e\_n\langle\vec{w\_n}\rangle\right\}\_e\tag{d\_3}$$

$$\frac{\overline{A \to \frac{A}{A \wedge A}}^{a}}{(A \to A) \wedge (A \to A)}^{a}$$

*Example 2.* The following example, illustrated in Figure 4e, is a reduction in the term calculus where we duplicate the spine of the term [a1, a<sup>2</sup> ←λx.λy.((λz.z)y)x].

↝<sup>D</sup> {x1⟨b1⟩.b1/a1}{x2⟨b2⟩.b2/a2}[x1⟨b1⟩, x2⟨b2⟩ ∣ x⟨x⟩[b1, b<sup>2</sup> ←λy.((λz.z)y)x]] ↝<sup>D</sup> {x1⟨c1⟩.y1⟨c1⟩c1/a1}{x2⟨c2⟩.y2⟨c2⟩.c2/a2} [x1⟨c1⟩, x2⟨c2⟩ ∣ x⟨x⟩[y1⟨c1⟩, y2⟨c2⟩ ∣ y⟨y⟩[c1, c<sup>2</sup> ←((λz.z)y)x]]] ↝<sup>D</sup> {x1⟨d1, e1⟩.y1⟨d1, e1⟩d1e1/a1}{x2⟨d2, e2⟩.y2⟨d2, e2⟩.d2e2/a2} [x1⟨d1, e1⟩, x2⟨d2, e2⟩ ∣ x⟨x⟩[y1⟨d1, e1⟩, y2⟨d2, e2⟩ ∣ y⟨y⟩[d1, d<sup>2</sup> ←(λz.z)y][e1, e<sup>2</sup> ←x]]] ↝<sup>L</sup> {x1⟨d1, e1⟩.y1⟨d1⟩d1e1/a1}{x2⟨d2, e2⟩.y2⟨d2⟩.d2e2/a2} [x1⟨d1, e1⟩, x2⟨d2, e2⟩ ∣ x⟨x⟩[y1⟨d1⟩, y2⟨d1⟩ ∣ y⟨y⟩[d1, d<sup>2</sup> ←(λz.z)y]] [e1, e<sup>2</sup> ←x]] ↝<sup>L</sup> {x1⟨e1⟩.y1⟨d1⟩d1e1/a1}{x2⟨e2⟩.y2⟨d2⟩.d2e2/a2} [x1⟨e1⟩, x2⟨e2⟩ ∣ x⟨x⟩[e1, e<sup>2</sup> ←x]] [y1⟨d1⟩, y2⟨d2⟩ ∣ y⟨y⟩[d1, d2(λz.z)y]] ↝<sup>D</sup> {λx1.y1⟨d1⟩d1x1/a1}{λx2.y2⟨d2⟩.d2x2/a2} [y1⟨d1⟩, y2⟨d2⟩ ∣ y⟨y⟩[d1, d<sup>2</sup> ←(λz.z)y]]

Reduction (↝(L,C,D,β)) preserves the conclusion of the derivation, and thus the following proposition is easy to observe.

**Proposition 3.** *If* s ↝(L,C,D,β) t *and* s ∶ A*, then* t ∶ A*.*

**Definition 10.** *For a term* t ∈ Λ<sup>S</sup> <sup>a</sup> *, if there does not exists a term* s ∈ Λ<sup>S</sup> <sup>a</sup> *such that* t ↝(L,C,D) s *then it is said that* t *is in sharing normal form.*

The following Lemma not only proves we have good translations in Section 3.1, and shows duplication preserves denotation.

**Lemma 1.** *For a* t ∈ Λ<sup>S</sup> <sup>a</sup> *in sharing normal form and a* N ∈ Λ*.*

> - <sup>N</sup> <sup>=</sup> <sup>N</sup> <sup>t</sup> <sup>=</sup> <sup>t</sup> <sup>∃</sup>M∈Λ.t <sup>=</sup> <sup>M</sup>

*Otherwise if* <sup>s</sup> ↝(L,D,C) <sup>t</sup> *then* <sup>s</sup> <sup>∣</sup> <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> <sup>=</sup> <sup>t</sup><sup>∣</sup> <sup>σ</sup> <sup>∣</sup> <sup>γ</sup> *.*

*Proof.* See [25, Lemma 24, Lemma 25].

**Lemma 2.** *Given a term* t ∈ Λ<sup>S</sup> <sup>a</sup> *, then* <sup>t</sup> *is* <sup>t</sup> *in sharing normal form.*

*Proof.* We can prove this by induction on the longest sharing reduction path from t. Our base case is already covered by Lemma 1. We are then interested in the inductive case, where <sup>t</sup> is not in sharing normal form. By Lemma 1, <sup>t</sup> <sup>=</sup> t ′ where t ↝(D,L,C) t ′ . By induction hypothesis, t ′ is in sharing normal form. Hence <sup>t</sup> is in sharing normal form. <sup>◻</sup>

## **4 Strong Normalisation of Sharing Reductions**

In order to show our calculus is strongly normalising, we first show that the sharing reduction rules are strongly normalising. We indite a measure on terms and show that this measure strictly decreases as sharing reduction progresses. Similar ideas and results can be found elsewhere: with *memory* in [20], the λ*-*I *calculus* in [6], the λ*-void calculus* [2], and the weakening λμ-calculus [17]. Our measure will consist of three components. First, the **height** of a term is a multiset of integers, that measures the number of constructors from each sharing node to the root of the term in its graphical notation. The height is defined on terms as H<sup>i</sup> (−), where i is an integer. We say H(t) for H<sup>1</sup>(t). We use ⊍ to denote the disjoint union of two multisets. We denote H<sup>i</sup> ([Γ1]) ⊍⋅⋅⋅⊍ H<sup>i</sup> ([Γn]) as H<sup>i</sup> ([Γ]) for the environment [Γ] = [Γ1],...,[Γn].

**Definition 11 (Sharing Height).** *The sharing height* H<sup>i</sup> (t) *of a term* t *is given below, where* n *is the number of closures in* [Γ]*:*

$$\begin{aligned} \mathcal{H}^i(x) &= \{ \} & \mathcal{H}^i(s \, t) &= \mathcal{H}^{i+1}(s) \cup \mathcal{H}^{i+1}(t) \\ \mathcal{H}^i(c\{\vec{x}\}.t) &= \mathcal{H}^{i+1}(t) & \mathcal{H}^i(t\{\Gamma\}) &= \mathcal{H}^i(t) \cup \mathcal{H}^i(\{\Gamma\}) \cup \{i^1\} \\ \mathcal{H}^i(\{x\_1, \ldots, x\_n \leftarrow t\}) &= \mathcal{H}^{i+1}(t) & \mathcal{H}^i(\{\vec{\vec{w}} \, |c\{\vec{x}\} \, \overline{\{\Gamma\}}\}) &= \mathcal{H}^{i+1}(\overline{\{\Gamma\}}) \cup \{ (i+1)^n\} \end{aligned}$$

This measure then strictly decreases for the rewrite rules l1, l2, l3, l<sup>4</sup> and l5, i.e. if t ↝<sup>L</sup> u then H<sup>i</sup> (t) > H<sup>i</sup> (u). The second measure we consider is the **weight** of a term. Intuitively this quantifies the remaining duplications, which are performed with ↝<sup>D</sup> reductions. If a term would be deleted, we assign it with a weight '1' to express that it is not duplicated. Calculating the weight requires an auxiliary function that assigns integer weights to the variables of a term. This function is defined on terms V<sup>i</sup> (−), where i is an integer. To measure variables independently of binders is vital. It allows to measure distributors, which duplicate λ's but not the bound variable. Also, only bound variables for abstractions are measured since variables bound by sharings are substituted in the interpretation.

**Definition 12 (Variable Weights).** *The function* V<sup>i</sup> (t) *returns a function that assigns integer weights to the free variables of* t*. It is defined by the below, where* f = V<sup>i</sup> (t) *and* g = f(x1) +⋅⋅⋅+ f(xn) *for each* x<sup>i</sup> ∈ x⃗*.*

$$\begin{aligned} \mathcal{V}^i(x) &= \{x \mapsto i\} & \mathcal{V}^i(s \, t) &= \mathcal{V}^i(s) \cup \mathcal{V}^i(t) \\ \mathcal{V}^i(c \,\{c\} \, t) &= \mathcal{V}^i(t) / \{c\} & \mathcal{V}^i(c \,\{\vec{x}\} \, t) &= \mathcal{V}^i(t) \cup \{c \to i\} \\ \mathcal{V}^i(t \,[\vec{x} \gets s]) &= \mathcal{V}^i(t) / \{\vec{x}\} \cup \mathcal{V}^g(s) & \mathcal{V}^i(t \gets s] &= \mathcal{V}^i(t) \cup \mathcal{V}^1(s) \\ \mathcal{V}^i(t \,[e\_1 \{\vec{w\_1}\} \, . \dots e\_n \{\vec{w\_n}\} \, | \, c \{c\} \, \overline{[\,]T] \} &= \mathcal{V}^i(t \overline{\{\!\!\!\!T\}}) / \{c, e\_1, \dots, e\_n\} \\ \mathcal{V}^i(t \,[e\_1 \{\!\!\!\/$$

The weight of a term can then be defined via the use of this auxiliary function. The auxiliary function is used when calculating the weight of a sharing, where the sharing weight of the variables bound by the sharing play a significant role in calculating the weight of the shared term. In the case of a weakening [← t], we assign an initial weight of 1. Again we say W(t) = W<sup>1</sup>(t).

**Definition 13 (Sharing Weight).** *The sharing weight* W<sup>i</sup> (t) *of a term* t *is a multiset of integers computed by the function defined below, where* f = V<sup>i</sup> (t) *and* g = f(x1) +⋅⋅⋅+ f(xn) *for each* x<sup>i</sup> ∈ x⃗*.*

$$\begin{aligned} \mathcal{W}^i(x) &= \{ \} & \mathcal{W}^i(s \, t) &= \mathcal{W}^i(s) \cup \mathcal{W}^i(t) \cup \{ i \} \\ \mathcal{W}^i(c \{ c \} \, t) &= \mathcal{W}^i(t) \cup \{ i \} \cup \{ \mathcal{V}^i(t) \{ c \} \} & \mathcal{W}^i(c \{ \vec{x} \} \, t) &= \mathcal{W}^i(t) \cup \{ i \} \\ \mathcal{W}^i(t [\vec{x} \gets s]) &= \mathcal{W}^i(t) \cup \mathcal{W}^g(s) & \mathcal{W}^i(t [\leftarrow s]) &= \mathcal{W}^i(t) \cup \mathcal{W}^1(s) \\ \mathcal{W}^i(t [e\_1 \{ \vec{w\_1} \} \dots e\_n \{ \vec{w\_n} \} \, | \, c \{ c \} \, \overline{\{ T \}}]) &= \mathcal{W}^i(t \overline{\{ T \}}) \cup \{ \mathcal{V}^i(t \overline{\{ T \}}) \} (c ) \} \\ \mathcal{W}^i(t [e\_1 \{ \vec{w\_1} \} \dots e\_n \{ \vec{w\_n} \} \, | \, c \{ \vec{x} \} \, \overline{\{ T \}}]) &= \mathcal{W}^i(t \overline{\{ T \}}) \end{aligned}$$

This measure then strictly decreases on the rewrite rules d1, d2, d<sup>3</sup> and is unaffected by all the other sharing reduction rules, i.e. if t ↝<sup>D</sup> u then W<sup>i</sup> (t) > W<sup>i</sup> (u). If t ↝(L,C) u then W<sup>i</sup> (t) = W<sup>i</sup> (u). The third and last measure we consider is the **number of closures** in the term, where it can be easily observed that the rewrite rules c<sup>1</sup> and c<sup>2</sup> strictly decrease this measure, and that the ↝<sup>L</sup> rules do not alter the number of closures. We then use this along with height and weight to define a *sharing measure* on terms.

**Definition 14.** *The sharing measure of a* Λ<sup>S</sup> <sup>a</sup> *-term* t *is a triple (*W(t)*,* C*,* H(t)*), where* C *is the number of closures in the term* t*. We compare sharing measures by using the lexicographical preferences according to* W > C > H*.*

**Theorem 1.** *Sharing reduction* ↝(D,L,C) *is strongly normalising.*

Now that we have proven the sharing reductions are strongly normalising, we can prove that they are confluent for closed terms.

**Theorem 2.** *The sharing reduction relation* ↝(D,L,C) *is confluent.*

*Proof.* Lemma 1 tells us that the preservation is preserved under reduction i.e. for <sup>s</sup> ↝(D,L,C) <sup>t</sup>, <sup>s</sup> <sup>=</sup> <sup>t</sup> . Therefore given <sup>t</sup> ↝<sup>∗</sup> (D,L,C) <sup>s</sup><sup>1</sup> and <sup>t</sup> ↝<sup>∗</sup> (D,L,C) <sup>s</sup>2, <sup>t</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> <sup>=</sup> <sup>s</sup><sup>2</sup> . Since we know that sharing reductions are strongly normalising, we know there exists terms u<sup>1</sup> and u<sup>2</sup> in sharing normal form such that s<sup>1</sup> ↝<sup>∗</sup> (D,L,C) u<sup>1</sup> and s<sup>2</sup> ↝<sup>∗</sup> (D,L,C) u2. Lemma 1 tells us that terms in sharing normal form are in correspondence with their denotations i.e. <sup>t</sup> <sup>=</sup> <sup>t</sup>. Since by Lemma 1 we know <sup>u</sup><sup>1</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> <sup>=</sup> <sup>s</sup><sup>2</sup> <sup>=</sup> <sup>u</sup><sup>2</sup> , and by Lemma 1 <sup>u</sup><sup>1</sup> <sup>=</sup> <sup>u</sup><sup>1</sup> and <sup>u</sup><sup>2</sup> <sup>=</sup> <sup>u</sup>2, we can conclude u<sup>1</sup> = u2. Hence, we prove confluence. ◻

## **5 Preservation of Strong Normalisation and Confluence**

A β-step in our calculus may occur within a weakening, and therefore is simulated by zero β-steps in the λ-calculus. Therefore if there is an infinite reduction path located inside a weakening in Λ<sup>S</sup> <sup>a</sup> , then the reduction path is not preserved in the corresponding λ-term as there are no weakenings. To deal with this, just as done in [2, 16, 17], we make use of the **weakening calculus**. A β-step is non-deleting precisely because of the weakening construct. If a β-step would be deleting, then the weakening calculus would instead keep the deleted term around as 'garbage', which can continue to reduce unless explicitly 'garbage-collected' by extra (nonβ) reduction steps. PSN has already be shown for the weakening calculus through the use of a perpetual strategy in [16]. A part of proving PSN is then using the weakening calculus to prove that if t ∈ Λ<sup>S</sup> <sup>a</sup> has a infinite reduction path, then its translation into the weakening calculus also has an infinite reduction path.

**Definition 15.** *The* <sup>W</sup>*-terms of the weakening calculus (*ΛW*) are*

T, U, V ∶∶= x ∣ λx.T <sup>∗</sup> ∣ U V ∣ T[← U] ∣ ● *(\*) where* x ∈ (T)fv

The terms are variable, abstraction, application, weakening, and a bullet. In the weakening T[← U], the subterm U is *weakened*. The interpretation of atomic terms to weakening terms -<sup>−</sup><sup>∣</sup> <sup>−</sup> <sup>∣</sup>−<sup>W</sup> can be seen as an extension of the translation into the λ-calculus (Definition 9).

**Definition 16.** *The interpretation* -<sup>−</sup><sup>∣</sup> <sup>−</sup> <sup>∣</sup>−<sup>W</sup> <sup>∶</sup> <sup>Λ</sup><sup>S</sup> <sup>a</sup> ×(V → ΛW)×(V → V ) → Λ<sup>W</sup> *with maps* σ ∶ V → Λ<sup>W</sup> *and* γ ∶ V → V *is defined as an extension of the translation in (Definition 9) with the following additional special cases.*

$$\begin{aligned} \left[\!\left.u\left[\leftarrow t\right]\right|\right|\sigma\left|\left.\gamma\right]\right\|\_{\mathcal{W}} &= \left[\!\left.u\left|\sigma\left|\left.\gamma\right|\right|\right]\_{\mathcal{W}} \left[\leftarrow\left[\!\left.t\right|\left.\sigma\left|\left.\gamma\right]\right]\_{\mathcal{W}}\right]\right] \\ \left[\!\left.u\left[\right|\left.c\left\{\left.c\right|\right\}\overline{\left.T\right]}\right]\right]|\sigma\left|\left.\gamma\right]\right]\_{\mathcal{W}} &= \left[\!\left.u\overline{\left[\!\left.T\right]}\right]|\sigma\left[\!\left.c\mapsto\!\right.\right]|\left.\gamma\right]\right]\_{\mathcal{W}} \\ \left[\!\left.u\left[\right|\left.c\{x\_{1},\ldots,x\_{n}\}\overline{\left.\overline{\left.T\right]}\right]\right]|\sigma\left|\left.\gamma\right]\right]\_{\mathcal{W}} &= \left[\!\left.u\overline{\left[\!\left.\overline{\left.T\right]}\right]\right]|\sigma'\left|\left.\gamma\right]\right]\_{\mathcal{W}} \end{aligned}$$

*where* σ′ (z) ∶= *if* z ∈ {x1,...,xn} *then* σ(z){●/γ(c)} *else* σ(z)

We say <sup>t</sup> <sup>W</sup> <sup>=</sup> <sup>t</sup><sup>∣</sup> <sup>I</sup> <sup>∣</sup> <sup>I</sup> <sup>W</sup> where <sup>I</sup> is the identity function. We also have translations of the weakening calculus to and from the λ-calculus. Both of these translations were provided in [16]. The interpretation ⌊−⌋ from weakening terms to λ-terms discards all weakenings.

**Definition 17.** *The interpretation* <sup>M</sup> <sup>∈</sup> <sup>Λ</sup>*,* −<sup>W</sup> <sup>∶</sup> <sup>Λ</sup> <sup>→</sup> <sup>Λ</sup><sup>W</sup> *is defined below.*

$$\{x\}^{\nu\nu} = x \quad \{M \cdot N\}^{\nu\nu} = \{M\}^{\nu\nu} \{N\}^{\nu\nu} \quad \{\lambda x.N\}^{\nu\nu} = \begin{cases} \lambda x. \{N\}^{\nu\nu} & \text{if } x \in \{N\}\_{fv} \\ \lambda x. \{N\}^{\nu\nu}[\leftarrow x] & \text{otherwise} \end{cases}$$

The following equalities can be observed, where σ<sup>Λ</sup>(z) = ⌊ σ<sup>W</sup>(z) ⌋.

**Proposition 4.** *For* N ∈ Λ *and* t ∈ Λ<sup>S</sup> <sup>a</sup> *the following properties hold*

$$\left\lfloor \left\lceil t \right\rceil \sigma^{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathcal{\mathsf{\mathsf{\mathcal{\mathsf{\mathsf{\mathcal{\mathsf{\mathcal{\mathsf{\mathsf{\mathcal{\mathsf{\mathcal{\prime}}\right\prime \mathsf{\mathsf{\mathcal{\prime}}\right\prime \mathsf{\cdot}}}}}}}}}}}}}}}}}}}}}} $$
\}\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\mathsf{\cdot}}}}}}}} 

*where for each* {x ↦ M} ∈ σ<sup>W</sup>*,* {x ↦ ⌊M ⌋} ∈ σ<sup>Λ</sup>*.*

**Definition 18.** *In the weakening calculus,* β*-reduction is defined as follows, where* [Γ] *are weakening constructs.* ((λx.T)[Γ])U →<sup>β</sup> T{U/x}[Γ]

**Proposition 5.** *If* <sup>N</sup> <sup>∈</sup> <sup>Λ</sup> *is strongly normalising, then so is* <sup>N</sup> <sup>W</sup>*.*

When translating from Λ<sup>S</sup> <sup>a</sup> to ΛW, weakenings are maintained whilst sharings are interpreted via substitution. Thus the reduction rules in the weakening calculus cover the spinal reductions for nullary distributors and weakenings.

**Definition 19.** *Weakening reduction (*→W*) proceeds as follows.*

$$\begin{aligned} U[\leftarrow T]V &\rightarrow\_{\mathcal{W}} \{U\,V\}[\leftarrow T] & U\,V[\leftarrow T] &\rightarrow\_{\mathcal{W}} \{U\,V\}[\leftarrow T] \\ T[\leftarrow U[\leftarrow V]] &\rightarrow\_{\mathcal{W}} T[\leftarrow U][\leftarrow V] & T[\leftarrow \lambda x.U] &\rightarrow\_{\mathcal{W}} T[\leftarrow U\{\bullet/x\}] \\ T[\leftarrow U\,V] &\rightarrow\_{\mathcal{W}} T[\leftarrow U][\leftarrow V] & T[\leftarrow \bullet] &\rightarrow\_{\mathcal{W}} T \\ T[\leftarrow U] &\rightarrow\_{\mathcal{W}} T^{\{1\}} &\lambda x.T[\leftarrow U] &\rightarrow\_{\mathcal{W}} \{\lambda x.T\}[\leftarrow U]^{\{2\}} \end{aligned}$$

*(1) if* U *is a subterm of* T *and (2) if* x ∈/ (U)fv

It is easy to see that these rules correspond to special cases of the sharing reduction rules for Λ<sup>S</sup> <sup>a</sup> . This resemblance is confirmed by the following Lemma, proven in [25, pp. 82-86]. We use this to show how Λ<sup>S</sup> <sup>a</sup> enjoys PSN.

**Lemma 3.** *If* <sup>t</sup> ↝<sup>β</sup> <sup>u</sup> *then* <sup>t</sup> <sup>W</sup> <sup>→</sup><sup>+</sup> <sup>β</sup> <sup>u</sup> <sup>W</sup>*. If* <sup>t</sup> ↝(C,D,L) <sup>u</sup> *and for any* <sup>x</sup> <sup>∈</sup> (t)bv ∪ (t)fp *such that for all* z*,* x / (σ(z))fv*.*

$$\{t \mid \sigma \mid \gamma\}\_{\mathcal{W}} \to\_{\omega}^{\*} \{u \mid \sigma \mid \gamma\}\_{\mathcal{W}}$$

**Lemma 4.** *For* t ∈ Λ<sup>S</sup> <sup>a</sup> *has an infinite reduction path, then* <sup>t</sup> <sup>W</sup> *also has an infinite reduction path.*

*Proof.* Due to Theorem 2, we know that the infinite reduction path contains infinite β-steps. This means in the reduction sequence, between each β-step, there are finite many ↝(D,L,C) reduction steps. Lemma 3 says each ↝(D,L,C) step in Λ<sup>S</sup> <sup>a</sup> corresponds to zero or more weakening reductions (↝<sup>∗</sup> <sup>W</sup>). Lemma 3 says that each beta step in Λ<sup>S</sup> <sup>a</sup> corresponds to one or more β-steps in ΛW. Therefore, it must be that <sup>t</sup> <sup>W</sup> also has an infinite reduction path. <sup>◻</sup> **Theorem 3.** *If* <sup>N</sup> <sup>∈</sup> <sup>Λ</sup> *is strongly normalising, then so is* <sup>N</sup> *.*

*Proof.* For a given N ∈ Λ that is strongly normalising, we know by Lemma 5 that <sup>N</sup> <sup>W</sup> is strongly normalising. Then - <sup>N</sup> <sup>W</sup> is strongly normalising, since Proposition 4 states that <sup>N</sup> <sup>W</sup> <sup>=</sup> - <sup>N</sup> <sup>W</sup>. Then by Lemma 4, which states that if <sup>t</sup> <sup>W</sup> is strongly normalising, then <sup>t</sup> is strongly normalising, proves that <sup>N</sup> is strongly normalising. ◻

We also prove confluence, which is already known for the λ-calculus [11]. We first observe that a β-step in the λ-calculus is simulated in Λ<sup>S</sup> <sup>a</sup> by one β-step followed by zero or more sharing reductions.

**Lemma 5.** *Given* N,M <sup>∈</sup> <sup>Λ</sup>*. If* <sup>N</sup> ↝<sup>β</sup> <sup>M</sup>*, then* <sup>N</sup> ↝<sup>β</sup> ↝<sup>∗</sup> (D,L,C) <sup>M</sup> *.*

*Proof.* This is proven by Sherratt in [25, Lemma 67].

**Theorem 4.** *Given* t, s1, s<sup>2</sup> ∈ Λ<sup>S</sup> <sup>a</sup> *. If* <sup>t</sup> ↝<sup>∗</sup> (β,D,L,C) <sup>s</sup><sup>1</sup> *and* <sup>t</sup> ↝<sup>∗</sup> (β,D,L,C) s2*, there exists a* u ∈ Λ<sup>S</sup> <sup>a</sup> *such that* <sup>s</sup><sup>1</sup> ↝<sup>∗</sup> (β,D,L,C) <sup>u</sup> *and* <sup>s</sup><sup>2</sup> ↝<sup>∗</sup> (β,D,L,C) u*.*

*Proof.* Suppose t ↝<sup>∗</sup> (β,D,L,C) <sup>s</sup><sup>1</sup> and <sup>t</sup> ↝<sup>∗</sup> (β,D,L,C) <sup>s</sup>2. Then we have <sup>t</sup> ↝<sup>∗</sup> <sup>β</sup> <sup>s</sup><sup>1</sup> and <sup>t</sup> ↝<sup>∗</sup> β <sup>s</sup><sup>2</sup> . By the Church-Rosser theorem, there exists a <sup>M</sup> <sup>∈</sup> <sup>Λ</sup> such that <sup>s</sup><sup>1</sup> ↝<sup>∗</sup> <sup>β</sup> <sup>M</sup> and <sup>s</sup><sup>2</sup> ↝<sup>∗</sup> <sup>β</sup> <sup>M</sup>. Due to Lemma 2, <sup>s</sup><sup>1</sup> <sup>=</sup> <sup>s</sup>′ <sup>1</sup> and <sup>s</sup><sup>2</sup> <sup>=</sup> <sup>s</sup>′ 2 where s′ 1, s′ <sup>2</sup> ∈ Λ<sup>S</sup> <sup>a</sup> in sharing normal form. Then thanks to Lemma 5 we know s′ <sup>1</sup> ↝<sup>∗</sup> (β,D,L,C) <sup>M</sup> and <sup>s</sup>′ <sup>2</sup> ↝<sup>∗</sup> (β,D,L,C) <sup>M</sup> . Combined, we get confluence. <sup>◻</sup>

## **6 Conclusion, related work, and future directions**

We have studied the interaction between the switch and the medial rule, the two characteristic inference rules of deep inference. We built a Curry–Howard interpretation based on this interaction, whose resulting calculus not only has the ability to duplicate terms atomically but can also duplicate solely the spine of an abstraction such that beta reduction can proceed on the duplicates. We show that this calculus has natural properties with respect to the λ-calculus.

This work, which started as an investigation into the Curry-Howard correspondence of the switch rule [25], fits into a broader effort to give a computational interpretation to intuitionistic deep-inference proof theory. Br¨unnler and McKinley [9] give a natural reduction mechanism without medial (or switch), and observe that preservation of strong normalization fails. Guenot and Straßburger [14] investigate a different switch rule, corresponding to the implication-left rule of sequent calculus. He [17] extends the atomic λ-calculus to the λμ-calculus.

Our future goal is to develop the intuitionistic open deduction formalism towards optimal reduction [23, 21, 3], via the remaining medial and switch rules [26].

**Acknowledgements** We thank the anonymous reviewers for their comments.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Learning Weighted Automata over Principal Ideal Domains***-*

Gerco van Heerdt1, Clemens Kupke2(-), Jurriaan Rot1,3, and Alexandra Silva<sup>1</sup>

> <sup>1</sup> University College London, United Kingdom {gerco.heerdt,alexandra.silva}@ucl.ac.uk <sup>2</sup> University of Strathclyde, United Kingdom clemens.kupke@strath.ac.uk <sup>3</sup> Radboud University, The Netherlands jrot@cs.ru.nl

**Abstract.** In this paper, we study active learning algorithms for weighted automata over a semiring. We show that a variant of Angluin's seminal L algorithm works when the semiring is a principal ideal domain, but not for general semirings such as the natural numbers.

## **1 Introduction**

Angluin's seminal L algorithm [4] for active learning of deterministic automata (DFAs) has been successfully used in many verification tasks, including in automatically building formal models of chips in bank cards or finding bugs in network protocols (see [27,14] for a broad overview of successful applications of active learning). While DFAs are expressive enough to capture interesting properties, certain verification tasks require more expressive models. This motivated several researchers to extend L to other types of automata, notably Mealy machines [28,24], register automata [15,22,1], and nominal automata [20].

Weighted finite automata (WFAs) are an important model made popular due to their applicability in image processing and speech recognition tasks [11,21]. The model is prevalent in other areas, including bioinformatics [2] and formal verification [3]. Passive learning algorithms and associated complexity results have appeared in the literature (see e.g. [5] for an overview), whereas active learning has been less studied [6,7]. Furthermore, the existing learning algorithms, both passive and active, have been developed assuming the weights in the automaton are drawn from a field, such as the real numbers.<sup>4</sup> To the best of our knowledge, no learning algorithms, whether passive or active, have been developed for WFAs in which the weights are drawn from a general semiring.

<sup>-</sup> The research leading to this work was partially funded by the European Union Horizon 2020 research and innovation programme under the ERC Starting Grant ProFoundNet (grant code 679127) and the Marie Sklodowska-Curie Grant Agreement No. 795119, by the EPSRC Standard Grant CLeVer (EP/S028641/1) and by GCHQ via the VeTSS grant "Automated black-box verification of networking systems" (4207703/RFA 15845). 's

<sup>4</sup> Balle and Mohri [6] define WFAs generically over a semiring but then restrict to fields from Section 3 onwards as they present an overview of existing learning algorithms.

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 602–621, 2020. https://doi.org/10.1007/978-3-030-45231-5\_31

In this paper, we explore *active learning* for WFAs over a general semiring. The main contributions of the paper are as follows:


We start in Section 2 by explaining the learning algorithm for WFAs over the reals and pointing out the challenges in extending it to arbitrary semirings.

## **2 Overview of the Approach**

In this section, we give an overview of the work developed in the paper through examples. We start by informally explaining the general algorithm for learning weighted automata that we introduce in Section 4, for the case where the semiring is a field. More specifically, for simplicity we consider the field of real numbers throughout this section. Later in the section, we illustrate why this algorithm does not work for an arbitrary semiring.

Angluin's L algorithm provides a procedure to learn the minimal DFA accepting a certain (unknown) regular language. In the weighted variant we will introduce in Section 4, for the specific case of the field of real numbers, the algorithm produces the minimal WFA accepting a weighted rational language (or formal power series) <sup>L</sup>: <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>R</sup>.

A WFA over R consists of a set of states, a linear combination of initial states, a transition function that for each state and input symbol produces a linear combination of successor states, and an output value in R for each state (Definition 5). As an example, consider the WFA over A <sup>=</sup> {a} below.

Here <sup>q</sup><sup>0</sup> is the only initial state, with weight 1, as indicated by the arrow into it that has no origin. When reading <sup>a</sup>, <sup>q</sup><sup>0</sup> transitions with weight 1 to itself and also with weight 1 to <sup>q</sup><sup>1</sup>; <sup>q</sup><sup>1</sup> transitions with weight 2 just to itself. The output of <sup>q</sup><sup>0</sup> is 2 and the output of <sup>q</sup><sup>1</sup> is 3.

The language of a WFA is determined by letting it read a given word and determining the final output according to the weights and outputs assigned to individual states. More precisely, suppose we want to read the word aaa in the example WFA above. Initially, <sup>q</sup><sup>0</sup> is assigned weight 1 and <sup>q</sup><sup>1</sup> weight 0. Processing the first <sup>a</sup> then leads to <sup>q</sup><sup>0</sup> retaining weight 1, as it has a self-loop with weight 1, and <sup>q</sup><sup>1</sup> obtaining weight 1 as well. With the next <sup>a</sup>, the weight of <sup>q</sup><sup>0</sup> still remains 1, but the weight of <sup>q</sup><sup>1</sup> doubles due to its self-loop of weight 1 and is added to the weight 1 coming from <sup>q</sup>0, leading to a total of 3. Similarly, after the last a the weights are 1 for <sup>q</sup><sup>0</sup> and 7 for <sup>q</sup>1. Since <sup>q</sup><sup>0</sup> has output 2 and <sup>q</sup><sup>1</sup> output 3, the final result is 2 · 1+3 · 7 = 23.

The learning algorithm assumes access to a *teacher* (sometimes also called *oracle*), who answers two types of queries:


In practice, membership queries are often easily implemented by interacting with the system one wants to model the behaviour of. However, equivalence queries are more complicated—as the perfect teacher does not exist and the target automaton is not known they are commonly approximated by testing. Such testing can however be done exhaustively if a bound on the number of states of the target automaton is known. Equivalence queries can also be implemented exactly when learning algorithms are being compared experimentally on generated automata whose languages form the targets. In this case, standard methods for language equivalence, such as the ones based on bisimulations [9], can be used.

The learning algorithm incrementally builds an *observation table*, which at each stage contains partial information about the language L determined by two finite sets S, E <sup>⊆</sup> <sup>A</sup><sup>∗</sup>. The algorithm fills the table through membership queries. As an example, and to set notation, consider the following table (over A <sup>=</sup> {a}).

$$S \begin{array}{c c c} \hline \multicolumn{3}{c}{E} & \text{row} : S \rightarrow \mathbb{R}^{E} \\ S \left[ \begin{array}{c c} \frac{\varepsilon \, \, a \,\, a \,\, a \,\,}{\varepsilon \, \vert 0 \,\, 1 \,\, 3} \\ a \,\, 1 \,\, 3 & \text{row} : S \cdot A \rightarrow \mathbb{R}^{E} \\ \frac{a \,\, 1 \,\, 3 \,\, 7 \,\, 15}{\overline{a}a \,\, 3 \,\, 7 \,\, 15} \\ & \text{row} (ua) (v) = \mathcal{L} (uav) \\ \hline \end{array} \end{array} \text{or} \begin{array}{c c} \text{row} \colon \, S \rightarrow \mathbb{R}^{E} \\ \hline \bullet \, \, a \,\, 1 \,\, 3 \,\, \, \forall \, c \,\, 1 \,\, \text{or} \, 1 \,\, \, \text{or} \, 1 \,\, \, \text{or} \, 1 \,\, \, \text{or} \, 1 \,\, \, \text{or} \, 1 \,\, \, \text{or} \, 1 \,\, \, \text{or} \, 1 \,\, \, \text{or} \, \, \, \, \text{or} \, \, \, \, \text{and} \, \, \, \, \, \}$$

This table indicates that <sup>L</sup> assigns 0 to ε, 1 to a, 3 to aa, 7 to aaa, and 15 to aaaa. For instance, we see that row(a)(aa) = srow(aa)(a) = 7. Since row and srow are fully determined by the language L, we will refer to an observation table as a pair (S, E), leaving the language <sup>L</sup> implicit.

If the observation table (S, E) satisfies certain properties described below, then it represents a WFA (S, δ, i, o), called the *hypothesis*, as follows:


For this to be well-defined, we need to have ε <sup>∈</sup> S (for the initial weights) and ε <sup>∈</sup> E (for the output weights), and for the transition function there is a crucial property of the table that needs to hold: closedness. In the weighted setting, a table is closed if for all <sup>t</sup> <sup>∈</sup> <sup>S</sup> · <sup>A</sup>, there exist <sup>r</sup><sup>s</sup> <sup>∈</sup> <sup>R</sup> for all <sup>s</sup> <sup>∈</sup> <sup>S</sup> such that

$$\text{srow}(t) = \sum\_{s \in S} r\_s \cdot \text{row}(s).$$

If this is not the case for a given t <sup>∈</sup> S · A, the algorithm adds t to S. The table is repeatedly extended in this manner until it is closed. The algorithm then constructs a hypothesis, using the closedness witnesses to determine transitions, and poses an equivalence query to the teacher. It terminates when the answer is **yes**; otherwise it extends the table with the counterexample provided by adding all its suffixes to E, and the procedure continues by closing again the resulting table. In the next subsection we describe the algorithm through an example.

*Remark 1.* The original L algorithm requires a second property to construct a hypothesis, called *consistency*. Consistency is difficult to check in extended settings, so the present paper is based on a variant of the algorithm inspired by Maler and Pnueli [19] where only closedness is checked and counterexamples are handled differently. See [13] for an overview of consistency in different settings.

#### **2.1 Example: Learning a Weighted Language over the Reals**

Throughout this section we consider the following weighted language:

$$\mathcal{L} \colon \{a\}^\* \to \mathbb{R} \qquad\qquad\qquad \mathcal{L}(a^j) = 2^j - 1.$$

The minimal WFA recognising it has 2 states. We will illustrate how the weighted variant of Angluin's algorithm recovers this WFA.

We start from S <sup>=</sup> E <sup>=</sup> {ε}, and fill the entries of the table on the left below by asking membership queries for ε and a. The table is not closed and hence we build the table on its right, adding the membership result for aa. The resulting table is closed, as srow(aa)=3 · row(a), so we construct the hypothesis <sup>A</sup>1.

The teacher replies **no** and gives the counterexample aaa, which is assigned 9 by the hypothesis automaton <sup>A</sup><sup>1</sup> but 7 in the language. Therefore, we extend <sup>E</sup> <sup>←</sup> E ∪ {a, aa, aaa}. The table becomes the one below. It is closed, as srow(aa) = <sup>3</sup> · row(a) <sup>−</sup> <sup>2</sup> · row(ε), so we construct a new hypothesis <sup>A</sup>2.

The teacher replies **yes** because <sup>A</sup><sup>2</sup> accepts the intended language assigning 2<sup>j</sup> <sup>−</sup> <sup>1</sup> <sup>∈</sup> <sup>R</sup> to the word a<sup>j</sup> , and the algorithm terminates with the correct automaton.

#### **2.2 Learning Weighted Languages over Arbitrary Semirings**

Consider now the same language as above, but represented as a map over the semiring of natural numbers <sup>L</sup>: {a}<sup>∗</sup> <sup>→</sup> <sup>N</sup> instead of a map <sup>L</sup>: {a}<sup>∗</sup> <sup>→</sup> <sup>R</sup> over the reals. Accordingly, we consider a variant of the learning algorithm over the semiring N rather than the algorithm over R described above. For the first part, the run of the algorithm for N is the same as above, but after receiving the counterexample we can no longer observe that srow(aa)=3 · row(a) <sup>−</sup> <sup>2</sup> · row(ε), since <sup>−</sup><sup>2</sup> ∈ <sup>N</sup>. In fact, there are no m, n <sup>∈</sup> <sup>N</sup> such that srow(aa) = m · row(ε) + n · row(a). To see this, consider the first two columns in the table and note that 3 <sup>7</sup> is bigger than <sup>0</sup> <sup>1</sup> = 0 and <sup>1</sup> <sup>3</sup> , so it cannot be obtained as a linear combination of the latter two using natural numbers. We thus have a closedness defect and update S <sup>←</sup> S ∪ {aa}, leading to the table below.

$$
\begin{array}{c|cccc}
\hline
\varepsilon & a & aa & aaa \\
\hline
\varepsilon \begin{array}{|c|c|}
\hline
0 & 1 & 3 & 7 \\
\hline
a & 1 & 3 & 7 & 15 \\
aaa & 3 & 7 & 15 & 31 \\
aaaa & 7 & 15 & 31 & 63 \\
\hline
\end{array} \\
\hline
\end{array}
$$

Again, the table is not closed, since <sup>7</sup> <sup>15</sup> <sup>&</sup>gt; <sup>3</sup> <sup>7</sup> . In fact, these closedness defects continue appearing indefinitely, leading to non-termination of the algorithm. This is shown formally in Section 5.

Note, however, that there does exist a WFA over N accepting this language:

$$\underbrace{\bigotimes^{a,1}}\_{\longleftrightarrow\bigotimes^{a,1}}\underbrace{\bigotimes^{a,2}}\_{\longleftrightarrow\cdots}\tag{1}$$

The reason that the algorithm cannot find the correct automaton is closely related to the algebraic structure induced by the semiring. In the case of the reals, the algebras are vector spaces and the closedness checks induce increases in the dimension of the hypothesis WFA, which in turn cannot exceed the dimension of the minimal one for the language. In the case of commutative monoids, the algebras for the natural numbers, the notion of dimension does not exist and unfortunately the algorithm does not terminate. In Section 6 we show that one can get around this problem for a class of semirings which includes the integers.

We mentioned earlier that during experimental evaluation the target WFA is known, and equivalence queries may be implemented via standard language equivalence methods. A further issue with arbitrary semirings is that language equivalence can be undecidable; that is the case, e.g., for the tropical semiring.

In Section 3 we recall basic definitions used throughout the paper, after which Section 4 introduces our general algorithm with its (parameterised) termination proof of Theorem 14. We then proceed to prove non-termination of the example discussed above over the natural numbers in Section 5 before instantiating our algorithm to PIDs in Section 6 and showing that it terminates in Theorem 28. We conclude with a discussion of related and future work in Section 7.

## **3 Preliminaries**

Throughout this paper we fix a semiring<sup>5</sup> <sup>S</sup> and a finite alphabet A. We start with basic definitions related to semimodules and weighted languages.

**Definition 2 (Semimodule).** *<sup>A</sup>* (left) semimodule M *over* <sup>S</sup> *consists of a monoid structure on* M*, written using* <sup>+</sup> *as the operation and* <sup>0</sup> *as the unit, together with a scalar multiplication map* ·: <sup>S</sup> <sup>×</sup> M <sup>→</sup> M *such that:*

<sup>s</sup> · <sup>0</sup><sup>M</sup> = 0<sup>M</sup> <sup>0</sup><sup>S</sup> · <sup>m</sup> = 0<sup>M</sup> <sup>1</sup> · <sup>m</sup> <sup>=</sup> <sup>m</sup> s · (m <sup>+</sup> n) = s · m <sup>+</sup> s · n (s <sup>+</sup> r) · m <sup>=</sup> s · m <sup>+</sup> r · m (sr) · m <sup>=</sup> s · (r · m).

*When the semiring is in fact a ring, we speak of a* module *rather than a semimodule. In the case of a field, the concept instantiates to a vector space.*

As an example, commutative monoids are the semimodules over the semiring of natural numbers. Any semiring forms a semimodule over itself by instantiating the scalar multiplication map to the internal multiplication. If X is any set and M is a semimodule, then <sup>M</sup><sup>X</sup> with pointwise operations also forms a semimodule. A similar semimodule is the *free semimodule* over X, which differs from M<sup>X</sup> in that it fixes M to be <sup>S</sup> and requires its elements to have *finite support*. This enables an important operation called *linearisation*.

**Definition 3 (Free semimodule).** *The* free semimodule *over a set* X *is given by the set*

V (X) = {f : X <sup>→</sup> <sup>S</sup> <sup>|</sup> supp(f) *is finite*}

*with pointwise operations. Here* supp(f) = {x <sup>∈</sup> X <sup>|</sup> f(x) = 0}*. We sometimes identify the elements of* V (X) *with formal sums over* X*. Any semimodule isomorphic to* V (X) *for some set* X *is called free.*

If <sup>X</sup> is a finite set, then <sup>V</sup> (X) = <sup>S</sup><sup>X</sup>. We now define *linearisation* of a function into a semimodule, which uniquely extends it to a semimodule homomorphism, witnessing the fact that V (X) is free.

<sup>5</sup> Rings and semirings considered in this paper are taken to be unital.

**Definition 4 (Linearisation).** *Given a set* X*, a semimodule* M*, and a function* f : X <sup>→</sup> M*, we define the* linearisation *of* f *as the semimodule homomorphism* <sup>f</sup> : V (X) <sup>→</sup> M *given by*

$$f^\sharp(\alpha) = \sum\_{x \in X} \alpha(x) \cdot f(x).$$

*The* (−) *operation has an inverse that maps a semimodule homomorphism* g : V (X) <sup>→</sup> M *to the function* g† : <sup>X</sup> <sup>→</sup> <sup>M</sup> *given by*

$$g^\dagger(x) = g(\partial\_x), \qquad \qquad \partial\_x(y) = \begin{cases} 1 & \text{if } y = x \\ 0 & \text{if } y \neq x. \end{cases}$$

We proceed with the definition of WFAs and their languages.

**Definition 5 (WFA).** *A* weighted finite automaton (WFA) *over* S *is a tuple* (Q, δ, i, o)*, where* <sup>Q</sup> *is a finite set,* <sup>δ</sup> : <sup>Q</sup> <sup>→</sup> (S<sup>Q</sup>)<sup>A</sup>*, and* i, o: Q <sup>→</sup> <sup>S</sup>*.*

<sup>A</sup> *weighted language* (or just *language*) over <sup>S</sup> is a function <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>S</sup>. To define the language accepted by a WFA <sup>A</sup> = (Q, δ, i, o), we first introduce the notions of *observability map* obs<sup>A</sup> : <sup>V</sup> (Q) <sup>→</sup> <sup>S</sup>A<sup>∗</sup> and *reachability map* reach<sup>A</sup> : <sup>V</sup> (A<sup>∗</sup>) <sup>→</sup> V (Q) as the semimodule homomorphisms given by

$$\begin{aligned} \mathsf{reach}\_{\mathcal{A}}^{\dagger}(\varepsilon) &= i & \mathsf{obs}\_{\mathcal{A}}(m)(\varepsilon) &= o^{\sharp}(m) \\ \mathsf{reach}\_{\mathcal{A}}^{\dagger}(ua) &= \delta^{\sharp}(\mathsf{reach}\_{\mathcal{A}}^{\dagger}(u))(a) & \mathsf{obs}\_{\mathcal{A}}(m)(au) &= \mathsf{obs}\_{\mathcal{A}}(\delta^{\sharp}(m)(a))(u). \end{aligned}$$

The *language accepted by a WFA* <sup>A</sup> = (Q, δ, i, o) is the function <sup>L</sup><sup>A</sup> : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>S</sup> given by <sup>L</sup><sup>A</sup> <sup>=</sup> obsA(i). Equivalently, one can define this as <sup>L</sup><sup>A</sup> <sup>=</sup> <sup>o</sup> ◦ reach† A.

## **4 General Algorithm for WFAs**

In this section we define the general algorithm for WFAs over S, as described informally in Section 2. Our algorithm assumes the existence of a *closedness strategy* (Definition 8), which allows one to check whether a table is closed, and in case it is, provide relevant witnesses. We then introduce sufficient conditions on <sup>S</sup> and on the language <sup>L</sup> to be learned under which the algorithm terminates.

**Definition 6 (Observation table).** *An* observation table *(or just* table*)* (S, E) *consists of two sets* S, E <sup>⊆</sup> <sup>A</sup><sup>∗</sup>*. We write* Tablefin <sup>=</sup> <sup>P</sup><sup>f</sup> (A<sup>∗</sup>) × P<sup>f</sup> (A<sup>∗</sup>) *for the set of finite tables (where* P<sup>f</sup> *(X) denotes the collection of finite subsets of a set* <sup>X</sup>*). Given a language* <sup>L</sup>: <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>S</sup>*, an observation table* (S, E) *determines the* row function row(S,E,L) : <sup>S</sup> <sup>→</sup> <sup>S</sup><sup>E</sup> *and the* successor row *function* srow(S,E,L) : <sup>S</sup> · <sup>A</sup> <sup>→</sup> <sup>S</sup><sup>E</sup> *as follows:*

$$\mathsf{row}\_{(S,E,\mathcal{L})}(w)(v) = \mathcal{L}(wv) \qquad \qquad \mathsf{strong}\_{(S,E,\mathcal{L})}(wa)(v) = \mathcal{L}(wav).$$

*We often write* row<sup>L</sup> *and* srowL*, or even* row *and* srow*, when the parameters are clear from the context.*

A table is *closed* if the successor rows are linear combinations of the existing rows in S. To make this precise, we use the linearisation row (Definition 4), which extends row to linear combinations of words in S.

**Definition 7 (Closedness).** *Given a language* <sup>L</sup>*, a table* (S, E) *is* closed *if for all* w <sup>∈</sup> S *and* a <sup>∈</sup> A *there exists* α <sup>∈</sup> V (S) *such that* srow(wa) = row(α)*.*

This corresponds to the notion of closedness described in Section 2.

A further important ingredient of the algorithm is a method for checking whether a table is closed. This is captured by the notion of closedness strategy.

**Definition 8 (Closedness strategy).** *Given a language* L*, a* closedness strategy *for* L *is a family of computable functions*

$$\left(\mathsf{cs}\_{(S,E)} \colon S \cdot A \to \{\bot\} \cup V(S)\right)\_{(S,E) \in \mathsf{Table\\_ins}}$$

*satisfying the following two properties:*

**–** *if* cs(S,E)(t) = <sup>⊥</sup>*, then there is no* <sup>α</sup> <sup>∈</sup> <sup>V</sup> (S) *s.t.* row(α) = srow(t)*, and* **–** *if* cs(S,E)(t) <sup>=</sup> <sup>⊥</sup>*, then* row(cs(S,E)(t)) = srow(t)*.*

Thus, given a closedness strategy as above, a table (S, E) is closed iff cs(S,E)(t) <sup>=</sup> <sup>⊥</sup> for all t <sup>∈</sup> S·A. More specifically, for each t <sup>∈</sup> S·A we have that cs(S,E)(t) <sup>=</sup> <sup>⊥</sup> iff the (successor) row corresponding to t already forms a linear combination of rows labelled by S. In that case, this linear combination is returned by cs(S,E)(t). This is used to close tables in our learning algorithm, introduced below.

Examples of semirings and (classes of) languages that admit a closedness strategy are described at the end of this section. Important for our algorithm will be that closedness strategies are computable. This problem is equivalent to solving systems of equations Ax <sup>=</sup> b, where A is the matrix whose columns are row(s) for s <sup>∈</sup> S, x is a vector of length <sup>|</sup>S|, and <sup>b</sup> is the vector consisting of the row entries in srow(t) for some t <sup>∈</sup> S · A. These observations motivate the following definition.

**Definition 9 (Solvability).** *A semiring* S *is* solvable *if a solution to any finite system of linear equations of the form* Ax <sup>=</sup> b *is computable.*

We have the following correspondence.

**Proposition 10.** *For any language accepted by a WFA over any semiring there exists a closedness strategy if and only if the semiring is solvable.*

*Proof.* If the semiring is solvable, we obtain a closedness strategy by the remarks prior to Definition 9. Conversely, we can construct a language that is non-zero on finitely many words and encode in a table (S, E) a given linear equation. To be able to freely choose the value in each table cell, we can consider a sufficiently large alphabet to make sure S and E contain only single-letter words. This avoids dependencies within the table. 

**Algorithm 1** Abstract learning algorithm for WFA over S

```
1: S, E ← {ε}
2: while true do
3: while cs(S,E)(t) = ⊥ for some t ∈ S · A do
4: S ← S ∪ {t}
5: for s ∈ S do
6: o(s) ← rowL(s)(ε)
7: for a ∈ A do
8: δ(s)(a) ← cs(S,E)(sa)
9: if EQ(S, δ, ε, o) = w ∈ A∗ then
10: E ← E ∪ suffixes(w)
11: else
12: return (S, δ, ε, o)
```
We now have all the ingredients to formulate the algorithm to learn weighted languages over a general semiring. The pseudocode is displayed in Algorithm 1.

The algorithm keeps a table (S, E), and starts by initialising both S and E to contain just the empty word. The inner while loop (lines 3–4) uses the closedness strategy to repeatedly check whether the current table is closed and add new rows in case it is not. Once the table is closed, a hypothesis is constructed, again using the closedness strategy (lines 5–8). This hypothesis (S, δ, ε, o) is then given to the teacher for an equivalence check. The equivalence check is modelled by EQ (line 9) as follows: if the hypothesis is incorrect, the teacher non-deterministically returns a counterexample <sup>w</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup>, the condition evaluates to true, and the suffixes of w are added to E; otherwise, if the hypothesis is correct, the condition on line 9 evaluates to false, and the algorithm returns the correct hypothesis on line 12.

#### **4.1 Termination of the General Algorithm**

The main question remaining is: under which conditions does this algorithm terminate and hence learns the unknown weighted language? We proceed to give abstract conditions under which it terminates. There are two main assumptions:


The first assumption is captured by the definition of progress measure:

**Definition 11 (Progress measure).** *A* progress measure *for a language* L *is a function* size : Tablefin <sup>→</sup> <sup>N</sup> *such that*


A progress measure assigns a 'size' to each table, in such a way that (a) there is a global bound on the size of tables, and (b) if we extend a table with some proper tests in E, i.e., such that some combinations of rows in row that were equal before get distinguished by a newly added test, then the size of the extended table is properly above the size of the original table. This is used to ensure that, when adding certain counterexamples supplied by the teacher, the size of the table, measured according to the above size function, properly increases.

The second assumption that we use for termination is phrased in terms of the Hankel matrix associated to the input language L, which represents L as the (semimodule generated by the) infinite table where both the rows and columns contain all words. The Hankel matrix is defined as follows.

**Definition 12 (Hankel matrix).** *Given a language* <sup>L</sup>: A<sup>∗</sup> <sup>→</sup> <sup>S</sup>*, the* semimodule generated by a table (S, E) *is given by the image of* row*. We refer to the semimodule generated by the table* (A<sup>∗</sup>, A<sup>∗</sup>) *as the* Hankel matrix *of* <sup>L</sup>*.*

The Hankel matrix is approximated by the tables that occur during the execution of the algorithm. For termination, we will therefore assume that this matrix satisfies the following finite approximation condition.

**Definition 13 (Ascending chain condition).** *We say that a semimodule* M *satisfies the* ascending chain condition *if for all inclusion chains of subsemimodules of* M*,*

$$S\_1 \subseteq S\_2 \subseteq S\_3 \subseteq \dotsb \text{ , }$$

*there exists* <sup>n</sup> <sup>∈</sup> <sup>N</sup> *such that for all* <sup>m</sup> <sup>≥</sup> <sup>n</sup> *we have* <sup>S</sup><sup>m</sup> <sup>=</sup> <sup>S</sup><sup>n</sup>*.*

Given the notions of progress measure, Hankel matrix and ascending chain condition, we can formulate the general theorem for termination of Algorithm 1.

**Theorem 14 (Termination of the abstract learning algorithm).** *In the presence of a progress measure, Algorithm 1 terminates whenever the Hankel matrix of the target language satisfies the ascending chain condition (Definition 13).*

*Proof.* Suppose the algorithm does not terminate. Then there is a sequence {(S<sup>n</sup>, E<sup>n</sup>)}<sup>n</sup>∈<sup>N</sup> of tables where (S<sup>0</sup>, E<sup>0</sup>) is the initial table and (S<sup>n</sup>+1, E<sup>n</sup>+1) is formed from (S<sup>n</sup>, E<sup>n</sup>) after resolving a closedness defect or adding columns due to a counterexample.

We write <sup>H</sup><sup>n</sup> for the semimodule generated by the table (S<sup>n</sup>, A<sup>∗</sup>). We have <sup>S</sup><sup>n</sup> <sup>⊆</sup> <sup>S</sup><sup>n</sup>+1 and thus <sup>H</sup><sup>n</sup> <sup>⊆</sup> <sup>H</sup><sup>n</sup>+1. Note that a closedness defect for (S<sup>n</sup>, E<sup>n</sup>) is also a closedness defect for (S<sup>n</sup>, A<sup>∗</sup>), so if we resolve the defect in the next step, the inclusion <sup>H</sup><sup>n</sup> <sup>⊆</sup> <sup>H</sup><sup>n</sup>+1 is strict. Since these are all included in the Hankel matrix, which satisfies the ascending chain condition, there must be an n such that for all k <sup>≥</sup> n we have that (S<sup>k</sup>, E<sup>k</sup>) is closed.

In [13, Section 6] it is shown that in a general table used for learning automata with side-effects given by a monad there exists a suffix of each counterexample for the corresponding hypothesis that when added as a column label leads to either a closedness defect or to distinguishing two combinations of rows in the table. Since WFAs are automata with side-effects given by the free semimodule monad<sup>6</sup> and we add all suffixes of the counterexample to the set of column labels, this also happens in our algorithm. Thus, for all k <sup>≥</sup> n where we process a counterexample, there must be two linear combinations of rows distinguished, as closedness is already guaranteed. Then the semimodule generated by (S<sup>k</sup>, E<sup>k</sup>) is a strict quotient of the semimodule generated by (S<sup>k</sup>+1, E<sup>k</sup>+1). By the progress measure we then find size(S<sup>k</sup>, E<sup>k</sup>) <sup>&</sup>lt; size(S<sup>k</sup>+1, E<sup>k</sup>+1), which cannot happen infinitely often. We conclude that the algorithm must terminate. 

To illustrate the hypotheses needed for Algorithm 1 and its termination (Theorem 14), we consider two classes of semirings for which learning algorithms are already known in the literature [7,13].

*Example 15 (Weighted languages over fields).* Consider any field for which the basic operations are computable. Solvability is then satisfied via a procedure such as Gaussian elimination, so by Proposition 10 there exists a closedness strategy. Hence, we can instantiate Algorithm 1 with S being such a field.

For termination, we show that the hypotheses of Theorem 14 are satisfied whenever the input language is accepted by a WFA. First, a progress measure is given by the dimension of the vector space generated by the table. To see this, note that if we distinguish two linear combinations of rows, we can assume without loss of generality that one of these linear combinations in the extended table uses only basis elements. This in turn can be rewritten to distinguishing a single row from a linear combination of rows using field operations, with the property that the extended version of the single row is a basis element. Hence, the row was not a basis element in the original table, and therefore the dimension of the vector space generated by the table has increased. Adding rows and columns cannot decrease this dimension, so it is bounded by the dimension of the Hankel matrix. Since the language we want to learn is accepted by a WFA, the associated Hankel matrix has a finite dimension [10,12] (see also, e.g., [5]), providing a bound for our progress measure.

Finally, for any ascending chain of subspaces of the Hankel matrix, these subspaces are of finite dimension bounded by the dimension of the Hankel matrix. The dimension increases along a strict subspace relation, so the chain converges.

*Example 16 (Weighted languages over finite semirings).* Consider any finite semiring. Finiteness allows us to apply a brute force approach to solving systems of equations. This means the semiring is solvable, and hence a closedness strategy exists by Proposition 10.

For termination, we can define a progress measure by assigning to each table the size of the image of row. Distinguishing two linear combinations of rows

<sup>6</sup> We note that [13] assumes the monad to preserve finite sets. However, the relevant arguments do not depend on this.

increases this measure. If the language we want to learn is accepted by a WFA, then the Hankel matrix contains a subset of the linear combinations of the languages of its states. Since there are only finitely many such linear combinations, the Hankel matrix is finite, which bounds our measure. A finite semimodule such as the Hankel matrix in this case does not admit infinite chains of subspaces. We conclude by Theorem 14 that Algorithm 1 terminates for the instance that the semiring S is a finite, if the input language is accepted by a WFA over S.

For the Boolean semiring, an instance of the above finite semiring example, WFAs are non-deterministic finite automata. The algorithm we recover by instantiating Algorithm 1 to this case is close to the algorithm first described by Bollig et al. [8]. The main differences are that in their case the hypothesis has a state space given by a minimally generating subset of the distinct rows in the table rather than all elements of S, and they do apply a notion of consistency.

In Section 6 we will show that Algorithm 1 can learn WFAs over principal ideal domains—notably including the integers—thus providing a strict generalisation of existing techniques.

## **5 Issues with Arbitrary Semirings**

We concluded the previous section with examples of semirings for which Algorithm 1 terminates if the target language is accepted by a WFA. In this section, we prove a negative result for the algorithm over the semiring N: we show that it does not terminate on a certain language over N accepted by a WFA over N, as anticipated in Section 2.2. This means that Algorithm 1 does not work well for arbitrary semirings. The problem is that the Hankel matrix of a language recognised by WFA does not necessarily satisfy the ascending chain condition that is used to prove Theorem 14. In the example given in the proof below, the Hankel matrix is not even finitely generated.

**Theorem 17.** *There exists a WFA* <sup>A</sup><sup>N</sup> *over* <sup>N</sup> *such that Algorithm <sup>1</sup> does not terminate when given* LA<sup>N</sup> *as input, regardless of the closedness strategy used.*

*Proof.* Let <sup>A</sup><sup>N</sup> be the automaton over the alphabet {a} given in (1) in Section 2.2. Formally, <sup>A</sup><sup>N</sup> = (Q, δ, i, o), where

$$\begin{aligned} Q &= \{q\_0, q\_1\} \\ \delta(q\_0)(a) &= q\_0 + q\_1 \end{aligned} \qquad \begin{aligned} i &= q\_0 & o(q\_0) &= 0 \\ \delta(q\_1)(a) &= 2q\_1 & o(q\_1) &= 1. \end{aligned}$$

As mentioned in Section 2.2, the language <sup>L</sup>: {a}<sup>∗</sup> <sup>→</sup> <sup>N</sup> accepted by <sup>A</sup><sup>N</sup> is given by <sup>L</sup>(a<sup>j</sup> )=2<sup>j</sup> <sup>−</sup> 1. This can be shown more precisely as follows. First one shows by induction on <sup>j</sup> that obsA<sup>N</sup> (q<sup>1</sup>)(a<sup>j</sup> )=2<sup>j</sup> for all <sup>j</sup> <sup>∈</sup> <sup>N</sup>—we leave the straightforward argument to the reader. Second, we show, again by induction on <sup>j</sup>, that obsA<sup>N</sup> (q<sup>0</sup>)(a<sup>j</sup> )=2<sup>j</sup> <sup>−</sup> 1. This implies the claim, as <sup>L</sup> <sup>=</sup> obsA<sup>N</sup> (q<sup>0</sup>). For <sup>j</sup> = 0 we have obsA<sup>N</sup> (q<sup>0</sup>)(a<sup>j</sup> ) = <sup>o</sup>(q<sup>0</sup>)=0=2<sup>0</sup> <sup>−</sup> 1 as required. For the inductive step, let <sup>j</sup> <sup>=</sup> <sup>k</sup> + 1 and assume obsA<sup>N</sup> (q0)(ak)=2<sup>k</sup> <sup>−</sup> 1. We calculate

$$\begin{split} \mathsf{obs}\_{\mathcal{A}\_{\mathbb{N}}}(q\_{0})(a^{k+1}) &= \mathsf{obs}\_{\mathcal{A}\_{\mathbb{N}}}(q\_{0} + q\_{1})(a^{k}) \\ &= \mathsf{obs}\_{\mathcal{A}\_{\mathbb{N}}}(q\_{0})(a^{k}) + \mathsf{obs}\_{\mathcal{A}\_{\mathbb{N}}}(q\_{1})(a^{k}) \\ &= (2^{k} - 1) + 2^{k} \\ &= 2^{k+1} - 1. \end{split}$$

Note that in particular the language L is injective.

Towards a contradiction, suppose the algorithm does terminate with table (S, E). Let J <sup>=</sup> {j <sup>∈</sup> <sup>N</sup> <sup>|</sup> a<sup>j</sup> <sup>∈</sup> <sup>S</sup>} and define <sup>n</sup> <sup>=</sup> max(J). Since the algorithm terminates with table (S, E), the latter must be closed. In particular, there exist <sup>k</sup><sup>j</sup> <sup>∈</sup> <sup>N</sup> for all <sup>j</sup> <sup>∈</sup> <sup>J</sup> such that <sup>j</sup>∈<sup>J</sup> <sup>k</sup><sup>j</sup> · rowL(a<sup>j</sup> ) = srowL(a<sup>n</sup>a). We consider two cases. First assume E <sup>=</sup> {ε} and let <sup>A</sup> = (Q , δ , i , o ) be the hypothesis. For all l <sup>∈</sup> <sup>N</sup> we have row L(reach† A(al ))(ε)=2<sup>l</sup> <sup>−</sup> 1 because <sup>A</sup> must be correct. Thus, if <sup>a</sup><sup>l</sup> <sup>∈</sup> S · A, then row L(reach† A(al )) = srowL(a<sup>l</sup> ). In particular,

$$\mathsf{row}\_{\mathcal{L}}^{\sharp}(\mathsf{reach}\_{\mathcal{A}}^{\dagger}(a^{n}a)) = \mathsf{succ}\_{\mathcal{L}}(a^{n}a) = \sum\_{j \in J} k\_{j} \cdot \mathsf{row}\_{\mathcal{L}}(a^{j}).$$

Note that we can choose the <sup>k</sup><sup>j</sup> such that reach† <sup>A</sup>(a<sup>n</sup>a) = <sup>j</sup>∈<sup>J</sup> <sup>k</sup><sup>j</sup> · <sup>a</sup><sup>j</sup> . Since

$$\begin{aligned} \text{row}\_{\mathcal{L}}^{\sharp} \left( \delta^{\prime \sharp} \left( \sum\_{j \in J} k\_j \cdot a^j \right) (a) \right) &= \text{row}\_{\mathcal{L}}^{\sharp} \left( \sum\_{j \in J} k\_j \cdot \delta^{\prime} (a^j) (a) \right) \\ &= \sum\_{j \in J} k\_j \cdot \text{row}\_{\mathcal{L}} (\delta^{\prime} (a^j) (a)) \\ &= \sum\_{j \in J} k\_j \cdot \text{row}\_{\mathcal{L}} (a^j a), \end{aligned}$$

we have row L(reach† <sup>A</sup>(a<sup>n</sup>aa)) = <sup>j</sup>∈<sup>J</sup> <sup>k</sup><sup>j</sup> · srowL(a<sup>j</sup>a) and therefore

$$\sum\_{j \in J} k\_j \cdot \mathsf{succ}\_{\mathcal{L}}(a^j a)(\varepsilon) = \mathsf{row}\_{\mathcal{L}}^\sharp(\mathsf{reach}\_{\mathcal{A}}^\dagger(a^n a a))(\varepsilon) = 2^{n+2} - 1.$$

Then

$$\begin{aligned} 2^{n+2} - 1 &= \sum\_{j \in J} k\_j \cdot \text{srow}\_{\mathcal{L}}(a^j a)(\varepsilon) = \sum\_{j \in J} k\_j (2^{j+1} - 1) \\ &= 2 \left( \sum\_{j \in J} k\_j (2^j - 1) \right) + \sum\_{j \in J} k\_j = 2(2^{n+1} - 1) + \sum\_{j \in J} k\_j, \end{aligned}$$

so <sup>j</sup>∈<sup>J</sup> <sup>k</sup><sup>j</sup> = 1. This is only possible if there is <sup>j</sup><sup>1</sup> <sup>∈</sup> <sup>J</sup> s.t. <sup>k</sup><sup>j</sup><sup>1</sup> = 1 and <sup>k</sup><sup>j</sup> = 0 for all <sup>j</sup> <sup>∈</sup> <sup>J</sup> \ {j<sup>1</sup>}. However, this implies that rowL(a<sup>j</sup><sup>1</sup> ) = srowL(a<sup>n</sup>a), which contradicts injectivity of <sup>L</sup> as n <sup>≥</sup> j<sup>1</sup>. Thus, the algorithm did not terminate.

For the other case, assume there is a<sup>m</sup> <sup>∈</sup> E such that m <sup>≥</sup> 1. We have

$$2^{n+1} - 1 = \text{srow}\_{\mathcal{L}}(a^n a)(\varepsilon) = \sum\_{j \in J} k\_j \cdot \text{row}\_{\mathcal{L}}(a^j)(\varepsilon) = \sum\_{j \in J} k\_j (2^j - 1),$$

so

$$\begin{split} \sum\_{j \in J} k\_j(2^{j+m}-1) &= \sum\_{j \in J} k\_j \cdot \text{row}\_{\mathcal{L}}(a^j)(a^m) \\ &= \text{row}\_{\mathcal{L}}(a^n a)(a^m) \\ &= 2^{n+m+1} - 1 \\ &= 2^m(2^{n+1}-1) + 2^m - 1 \\ &= 2^m \left( \sum\_{j \in J} k\_j(2^j - 1) \right) + 2^m - 1 \\ &= \left( \sum\_{j \in J} k\_j(2^{j+m} - 2^m) \right) + 2^m - 1 \\ &= \left( \sum\_{j \in J} k\_j(2^{j+m} - 1) \right) + \left( \sum\_{j \in J} k\_j(1 - 2^m) \right) + 2^m - 1. \end{split}$$

Then

$$\left(\sum\_{j\in J} k\_j(1-2^m)\right) + 2^m - 1 = 0.$$

Since <sup>m</sup> <sup>≥</sup> 1 this is only possible if there is <sup>j</sup><sup>1</sup> <sup>∈</sup> <sup>J</sup> s.t. <sup>k</sup><sup>j</sup><sup>1</sup> = 1 and <sup>k</sup><sup>j</sup> = 0 for all j <sup>∈</sup> J \ {j<sup>1</sup>}. However, this implies rowL(a<sup>j</sup><sup>1</sup> ) = srowL(a<sup>n</sup>a), which again contradicts injectivity of <sup>L</sup> as n <sup>≥</sup> j<sup>1</sup>. Thus, the algorithm did not terminate. 

*Remark 18.* Our proof shows non-termination for a bigger class of algorithms than Algorithm 1; it uses only the definition of the hypothesis, that closedness is satisfied before constructing the hypothesis, that S and E contain the empty word, and that termination implies correctness. For instance, adding the prefixes of a counterexample to S instead of its suffixes to E will not fix the issue.

We have thus shown that our algorithm does not instantiate to a terminating one for an arbitrary semiring. To contrast this negative result, in the next section we identify a class of semirings not previously explored in the learning literature where we can guarantee a terminating instantiation.

## **6 Learning WFAs over PIDs**

We show that for a subclass of semirings, namely *principal ideal domains (PIDs)*, the abstract learning algorithm of Section 4 terminates. This subclass includes the integers, Gaussian integers, and rings of polynomials in one variable with coefficients in a field. We will prove that the Hankel matrix of a language over a PID accepted by a WFA has analogous properties to those of vector spaces finite rank, a notion of progress measure, and the ascending chain condition. We also give a sufficient condition for PIDs to be solvable, which by Proposition 10 guarantees the existence of a closedness strategy for the learning algorithm.

To define PIDs, we first need to introduce ideals. Given a ring S, a *(left) ideal* I of <sup>S</sup> is an additive subgroup of <sup>S</sup> s.t. for all s <sup>∈</sup> <sup>S</sup> and i <sup>∈</sup> I we have si <sup>∈</sup> I. The ideal I is *(left) principal* if it is of the form I <sup>=</sup> <sup>S</sup>s for some s <sup>∈</sup> <sup>S</sup>.

**Definition 19 (PID).** *A* principal ideal domain P *is a non-zero commutative ring in which every ideal is principal and where for all* <sup>p</sup>1, p<sup>2</sup> <sup>∈</sup> <sup>P</sup> *such that* <sup>p</sup>1p<sup>2</sup> = 0 *we have* <sup>p</sup><sup>1</sup> = 0 *or* <sup>p</sup><sup>2</sup> = 0*.*

A module M over a PID <sup>P</sup> is called *torsion free* if for all p <sup>∈</sup> <sup>P</sup> and any m <sup>∈</sup> M such that p · m = 0 we have p = 0 or m = 0. It is a standard result that a module over a PID is torsion free if and only if it is free [17, Theorem 3.10].

The next definition of *rank* is analogous to that of the dimension of a vector space and will form the basis for the progress measure.

**Definition 20 (Rank).** *We define the* rank *of a finitely generated free module* V (X) *over a PID as* rank(V (X)) = <sup>|</sup>X|*.*

This definition extends to any finitely generated free module over a PID, as <sup>V</sup> (X) <sup>∼</sup><sup>=</sup> <sup>V</sup> (<sup>Y</sup> ) for finite sets <sup>X</sup> and <sup>Y</sup> implies <sup>|</sup>X<sup>|</sup> <sup>=</sup> <sup>|</sup><sup>Y</sup> <sup>|</sup> [17, Theorem 3.4].

Now that we have a candidate for a progress measure function, we need to prove it has the required properties. The following lemmas will help with this.

**Lemma 21.** *Given finitely generated free modules* M,N *over a PID s.t.* rank(M) <sup>≥</sup> rank(N)*, any surjective module homomorphism* f : N <sup>→</sup> M *is injective.*

*Proof.* Since rank(M) <sup>≥</sup> rank(N), there exists a surjective module homomorphism g : M <sup>→</sup> N. Therefore g ◦ f : N <sup>→</sup> N is surjective and by [23] an iso. In particular, f is injective. 

**Lemma 22.** *If* M *and* N *are finitely generated free modules over a PID such that there exists a surjective module homomorphism* f : N <sup>→</sup> M*, then* rank(M) <sup>≤</sup> rank(N)*. If* f *is not injective, then* rank(M) < rank(N)*.*

*Proof.* Let f : N <sup>→</sup> M be a surjective module homomorphism. Suppose towards a contradiction that rank(M) > rank(N). By Lemma <sup>21</sup> f is injective, so M is isomorphic to a submodule of N and rank(M) <sup>≤</sup> rank(N) [17]; contradiction.

For the second part, suppose f is not injective and assume towards a contradiction that rank(M) <sup>≥</sup> rank(N). Again by Lemma <sup>21</sup> f is injective, which is a contradiction with our assumption. Thus, in this case rank(M) < rank(N). 

The lemma below states that the Hankel matrix of a weighted language over a PID has finite rank which bounds the rank of any module generated by an observation table. This will be used to define a progress measure, used to prove termination of the learning algorithm for weighted languages over PIDs.

**Lemma 23 (Hankel matrix rank for PIDs).** *When targeting a language accepted by a WFA over a PID, any module generated by an observation table is free. Moreover, the Hankel matrix has finite rank that bounds the rank of any module generated by an observation table.*

*Proof.* Given a WFA <sup>A</sup> = (Q, δ, i, o), let M be the free module generated by <sup>Q</sup>. Note that the Hankel matrix is the image of the composition obs<sup>A</sup> ◦ reachA. Consider the image of the module homomorphism reach<sup>A</sup> : <sup>V</sup> (A∗) <sup>→</sup> <sup>M</sup>, which we write as R. Since R is a submodule of M, we know from [17] that R is free and finitely generated with rank(R) <sup>≤</sup> rank(M). The Hankel matrix can now be obtained as the image of the restriction of obs<sup>A</sup> : <sup>M</sup> <sup>→</sup> <sup>S</sup>A<sup>∗</sup> to the domain R. Let H be this image, which we know is finitely generated because R is. Since H is a submodule of the torsion free module <sup>S</sup>A<sup>∗</sup> , it is also torsion free and therefore free. We also have a surjective module homomorphism s: R <sup>→</sup> H, so by Lemma <sup>22</sup> we find rank(H) <sup>≤</sup> rank(R).

Let N be the module generated by an observation table (S, E). We have that <sup>N</sup> is a quotient of the module generated by (S, A<sup>∗</sup>), which in turn is a submodule of H. Using again [17] and Lemma <sup>22</sup> we conclude that N is free and finitely generated with rank(N) <sup>≤</sup> rank(H). 

The second part of Lemma 23 would follow from a PID variant of Fliess' theorem [12]. We are not aware of such a result, and leave this for future work.

**Proposition 24 (Progress measure for PIDs).** *There exists a progress measure for any language accepted by a WFA over a PID.*

*Proof.* Define size(S, E) = rank(M), where M is the module generated by the table (S, E). By Lemma <sup>23</sup> this is bounded by the rank of the Hankel matrix. If M and N are modules generated by two tables such that N is a strict quotient of M, then by Lemma <sup>22</sup> we have rank(M) > rank(N). 

Recall that, for termination of the algorithm, Theorem 14 requires a progress measure, which we defined above, and it requires the Hankel matrix of the language to satisfy the ascending chain condition (Definition 13). Proposition 25 shows that the latter is always the case for languages over PIDs.

**Proposition 25 (Ascending chain condition PIDs).** *The Hankel matrix of a language accepted by a WFA over a PID satisfies the ascending chain condition.*

*Proof.* Let H be the Hankel matrix, which has finite rank by Lemma 23. If

$$M\_1 \subseteq M\_2 \subseteq M\_3 \subseteq \cdots$$

is any chain of submodules of H, then M <sup>=</sup> <sup>i</sup>∈<sup>N</sup> <sup>M</sup><sup>i</sup> is a submodule of <sup>H</sup> and therefore also of finite rank [17]. Let B be a finite basis of M. There exists n <sup>∈</sup> <sup>N</sup> such that <sup>B</sup> <sup>⊆</sup> <sup>M</sup><sup>n</sup>, so <sup>M</sup><sup>n</sup> <sup>=</sup> <sup>M</sup>. 

The last ingredient for the abstract algorithm is solvability of the semiring: the following fact provides a sufficient condition for a PID to be solvable.

**Proposition 26 (PID solvability).** *A PID* P *is solvable if all of its ring operations are computable and if each element of* P *can be* effectively *factorised into irreducible elements.*

*Proof.* It is well-known that a system of equations of the form Ax <sup>=</sup> <sup>b</sup> with integer coefficients can be efficiently solved via computing the Smith normal form [25] of A. The algorithm generalises to principal ideal domains, if we assume that the factorisation of any given element of the principal ideal domain<sup>7</sup> into irreducible elements is computable, cf. the algorithm in [16, p. 79-84]. To see that all steps in this algorithm can be computed, one has to keep in mind that the factorisation can be used to determine the greatest common divisor of any two elements of the principal ideal domain. 

*Remark 27.* In the case that we are dealing with an Euclidean domain P, a sufficient condition for P to be solvable is that Euclidean division is computable (again this can be deduced from inspecting the algorithm in [16, p. 79-84]). Such a PID behaves essentially like the ring of integers.

Putting everything together, we obtain the main result of this section.

**Theorem 28 (Termination for PIDs).** *Algorithm 1 can be instantiated and terminates for any language accepted by a WFA over a PID of which all ring operations are computable and of which each element can be effectively factorised into irreducible elements.*

*Proof.* To instantiate the algorithm, we need a closedness strategy. According to Proposition 10 it is sufficient for the PID to be solvable, which is shown by Proposition 26. Proposition 24 provides a progress measure, and we know from Proposition 25 that the Hankel matrix satisfies the ascending chain condition, so by Theorem 14 the algorithm terminates. 

The example run given in Section 2.1 is the same when performed over the integers. We note that if the teacher holds an automaton model of the correct language, equivalence queries are decidable by lifting the embedding of the PID into its *quotient field* to the level of WFAs and checking equivalence there.

## **7 Discussion**

We have introduced a general algorithm for learning WFAs over arbitrary semirings, together with sufficient conditions for termination. We have shown an inherent termination issue over the natural numbers and proved termination for PIDs. Our work extends the results by Bergadano and Varricchio [7], who showed that WFAs over fields could be learned from a teacher. Although we note that a PID can be embedded into its corresponding field of fractions, the WFAs produced when learning over the field potentially have weights outside the PID.

<sup>7</sup> Note that factorisations exist as each principal ideal domain is also a unique factorisation domain, cf. e.g. [17, Thm. 2.23].

Algorithmic issues with WFAs over arbitrary semirings have been identified before. For instance, Krob [18] showed that language equivalence is undecidable for WFAs over the tropical semiring.

On the technical level, a variation on WFAs is given by probabilistic automata, where transitions point to convex rather than linear combinations of states. One easily adapts the example from Section 5 to show that learning probabilistic automata has a similar termination issue. On the positive side, Tappler et al. [26] have shown that deterministic MDPs can be learned using an L based algorithm. The deterministic MDPs in *loc.cit.* are very different from the automata in our paper, as their states generate observable output that allows to identify the current state based on the generated input-output sequence.

One drawback of the ascending chain condition on the Hankel matrix is that this does not give any indication of the number of steps the algorithm requires. Indeed, the submodule chains traversed, although converging, may be arbitrarily long. We would like to measure and bound the progress made when fixing closedness defects, but this turns out to be challenging for PIDs. The rank of the module generated by the table may not increase. We leave an investigation of alternative measures to future work.

We would also like to adapt the algorithm so that for PIDs it always produces minimal automata. At the moment this is already the case for fields,<sup>8</sup> since adding a row due to a closedness defect preserves linear independence of the image of row. For PIDs things are more complicated—adding rows towards closedness may break linear independence and thus a basis needs to be found in row. This complicates the construction of the hypothesis.

Our results show that, on the one hand, WFAs can be learned over finite semirings and arbitrary PIDs (assuming computability of the relevant operations) and, on the other hand, that there exists an infinite commutative semiring for which they cannot be learned. However, there are many classes of semirings in between commutative semirings and PIDs, of which we would like to know whether their WFAs can be learned by our general algorithm.

Finally, we would like to generalise our results to extend the framework introduced in [13], which focusses on learning automata with side-effects over a monad. WFAs as considered in the present paper are an instance of those, where the monad is the free semimodule monad V (−). At the moment, the results in [13] apply to a monad that preserves finite sets, but much of our general WFA learning algorithm and termination argument can be extended to that setting. It would be interesting to see if crucial properties of PIDs that lead to a progress measure and to satisfying the ascending chain condition could also be translated to the monad level.

*Acknowledgments.* We thank Joshua Moerman for comments and discussions.

<sup>8</sup> There is one exception: the language that assigns 0 to every word, which is accepted by a WFA with no states. The algorithm initialises the set of row labels, which constitute the state space of the hypothesis, with the empty word.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **The Polynomial Complexity of Vector Addition Systems with States**

Florian Zuleger () zuleger@forsyte.tuwien.ac.at

TU Wien

**Abstract.** Vector addition systems are an important model in theoretical computer science and have been used in a variety of areas. In this paper, we consider vector addition systems with states over a parameterized initial configuration. For these systems, we are interested in the standard notion of computational time complexity, i.e., we want to understand the length of the longest trace for a fixed vector addition system with states depending on the size of the initial configuration. We show that the asymptotic complexity of a given vector addition system with states is either Θ(N<sup>k</sup>) for some computable integer k, where N is the size of the initial configuration, or at least exponential. We further show that k can be computed in polynomial time in the size of the considered vector addition system. Finally, we show that 1 <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>2</sup><sup>n</sup>, where <sup>n</sup> is the dimension of the considered vector addition system.

## **1 Introduction**

Vector addition systems (VASs) [13], which are equivalent to Petri nets, are a popular model for the analysis of parallel processes [7]. Vector addition systems with states (VASSs) [10] are an extension of VASs with a finite control and are a popular model for the analysis of concurrent systems, because the finite control can for example be used to model shared global memory [12]. In this paper, we consider VASSs over a parameterized initial configuration. For these systems, we are interested in the standard notion of computational time complexity, i.e., we want to understand the length of the longest execution for a fixed VASS depending on the size of the initial configuration. VASSs over a parameterized initial configuration naturally arise in two areas: 1) *The parameterized verification problem.* For concurrent systems the number of system processes is often not known in advance, and thus the system is designed such that a template process can be instantiated an arbitrary number of times. The problem of analyzing the concurrent system for all possible system sizes is a common theme in the literature [9, 8, 1, 11, 4, 2, 3]. 2) *Automated complexity analysis of programs.* VASSs (and generalizations) have been used as backend in program analysis tools for automated complexity analysis [18–20]. The VASS considered by these tools are naturally parameterized over the initial configuration, modelling the dependency of the program complexity on the program input. The cited papers have proposed practical techniques but did not give complete algorithms.

J. Goubault-Larrecq and B. K¨onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 622–641, 2020. https://doi.org/10.1007/978-3-030-45231-5\_32

Two recent papers have considered the computational time complexity of VASSs over a parameterized initial configuration. [15] presents a PTIME procedure for deciding whether a VASS is polynomial or at least exponential, but does not give a precise analysis in case of polynomial complexity. [5] establishes the precise asymptotic complexity for the special case of VASSs whose configurations are linearly bounded in the size of the initial configuration. In this paper, we generalize both results and fully characterize the asymptotic behaviour of VASSs with polynomial complexity: We show that the asymptotic complexity of a given VASS is either Θ(N<sup>k</sup>) for some computable integer k, where N is the size of the initial configuration, or at least exponential. We further show that k can be computed in PTIME in the size of the considered VASS. Finally, we show that 1 <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>2</sup><sup>n</sup>, where <sup>n</sup> is the dimension of the considered VASS.

#### **1.1 Overview and Illustration of Results**

We discuss our approach on the VASS Vrun, stated in Figure 1, which will serve as running example. The VASS has dimension 3 (i.e., the vectors annotating the transitions have dimension 3) and four states s1, s2, s3, s4. In this paper we will always represent vectors using a set of variables *Var* , whose cardinality equals the dimension of the VASS. For Vrun we choose *Var* = {x, y, z} and use x, y, z as indices for the first, second and third component of 3-dimensional vectors. The configurations of a VASS are pairs of states and valuations of the variables to non-negative integers. A step of a VASS moves along a transition from the current state to a successor state, and adds the vector labelling the transition to the current valuation; a step can only be taken if the resulting valuation is non-negative. For the computational time complexity analysis of VASSs, we consider traces (sequences of steps) whose initial configurations consist of a valuation whose maximal value is bounded by N (the parameter used for bounding the size of the initial configuration). The computational time complexity is then the length of the longest trace whose initial configuration is bounded by N. For ease of exposition, we will in this paper only consider VASSs whose control-flow graph is *connected*. (For the general case, we remark that one needs to decompose a VASS into its strongly-connected components (SCCs), which can then be analyzed in isolation, following the DAG-order of the SCC decomposition; for this, one slightly needs to generalize the analysis in this paper to initial configurations with values <sup>Θ</sup>(N<sup>k</sup><sup>x</sup> ) for every variable <sup>x</sup> <sup>∈</sup> *Var* , where <sup>k</sup><sup>x</sup> <sup>∈</sup> <sup>Z</sup>.) For ease of exposition, we further consider traces over arbitrary initial states (instead of some fixed initial state); this is justified because for a fixed initial state one can always restrict the control-flow graph to the reachable states, and then the two options result in the same notion of computational complexity (up to a constant offset, which is not relevant for our asymptotic analysis).

In order to analyze the computational time complexity of a considered VASS, our approach computes *variable bounds* and *transition bounds*. A variable bound is the maximal value of a variable reachable by any trace whose initial configuration is bounded by N. A transition bound is the maximal number of times a transition appears in any trace whose initial configuration is bounded by N. For Vrun, our approach establishes the linear variable bound Θ(N) for x and y, and the quadratic bound Θ(N2) for z. We note that because the variable bound of z is quadratic and not linear, Vrun cannot be analyzed by the procedure of [5]. Our approach establishes the bound Θ(N) for the transitions s<sup>1</sup> → s<sup>3</sup> and s<sup>4</sup> → s2, the bound <sup>Θ</sup>(N2) for transitions <sup>s</sup><sup>1</sup> <sup>→</sup> <sup>s</sup>2, <sup>s</sup><sup>2</sup> <sup>→</sup> <sup>s</sup>1, <sup>s</sup><sup>3</sup> <sup>→</sup> <sup>s</sup>4, <sup>s</sup><sup>4</sup> <sup>→</sup> <sup>s</sup>3, and the bound <sup>Θ</sup>(N3) for all self-loops. The computational complexity of <sup>V</sup>run is then the maximum of all transition bounds, i.e., Θ(N3). In general, our main algorithm (Algorithm 1 presented in Section 4) either establishes that the VASS under analysis has at least exponential complexity or computes asymptotically precise variable and transition bounds Θ(N<sup>k</sup>), with k computable in PTIME and <sup>1</sup> <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>2</sup><sup>n</sup>, where <sup>n</sup> is the dimension of the considered VASS. We note that our upper bound 2<sup>n</sup> also improves the analysis of [15], which reports an exponential dependence on the number of transitions (and not only on the dimension).

We further state a family V<sup>n</sup> of VASSs, which illustrate that k can indeed be exponential in the dimension (the example can be skipped on first reading). V<sup>n</sup> uses variables xi,j and consists of states si,j , for 1 ≤ i ≤ n and j = 1, 2. We note that V<sup>n</sup> has dimension 2n. V<sup>n</sup> consists of the transitions


Vexp in Figure 1 depicts V<sup>n</sup> for n = 3, where the vector components are stated in the order x1,1, x1,2, x2,1, x2,2, x3,1, x3,2. It is not hard to verify for all 1 ≤ i ≤ n that Θ(N<sup>2</sup>i−<sup>1</sup> ) is the precise asymptotic variable bound for xi,<sup>1</sup> and xi,2, that si,<sup>1</sup> → si,2, si,<sup>2</sup> → si,<sup>1</sup> and si,<sup>1</sup> → s<sup>i</sup>+1,1, s<sup>i</sup>+1,<sup>2</sup> → si,<sup>2</sup> in case i<n, and that Θ(N<sup>2</sup><sup>i</sup> ) is the precise asymptotic transition bound for si,<sup>1</sup> → si,1, si,<sup>2</sup> → si,<sup>2</sup> (Algorithm 1 can be used to find these bounds).

#### **1.2 Related Work**

A celebrated result on VASs is the EXPSPACE-completeness [16, 17] of the boundedness problem. Deciding termination for a VAS with a *fixed* initial configuration can be reduced to the boundedness problem, and is therefore also EXPSPACE-complete; this also applies to VASSs, whose termination problem can be reduced to the VAS termination problem. In contrast, deciding the termination of VASSs for *all* initial configurations is in PTIME. It is not hard to see that non-termination over all initial configurations is equivalent to the existence of non-negative cycles (e.g., using Dickson's Lemma [6]). Kosaraju and Sullivan have given a PTIME procedure for the detection of zero-cycles [14], which can be easily be adapted to non-negative cycles. The existence of zero-cycles is decided

**Fig. 1.** VASS Vrun (left) and VASS Vexp (right)

by the repeated use of a constraint system in order to remove transitions that can definitely not be part of a zero-cycle. The algorithm of Kosaraju and Sullivan forms the basis for both cited papers [15, 5], as well as the present paper.

A line of work [18–20] has used VASSs (and their generalizations) as backends for the automated complexity analysis of C programs. These algorithms have been designed for practical applicability, but are not complete and no theoretical analysis of their precision has been given. We point out, however, that these papers have inspired the Bound Proof Principle in Section 5.

## **2 Preliminaries**

*Basic Notation.* For a set X we denote by |X| the number of elements of X. Let S be either N or Z. We write S<sup>I</sup> for the set of vectors over S indexed by some set I. We write S<sup>I</sup>×<sup>J</sup> for the set of matrices over S indexed by I and J. We write **<sup>1</sup>** for the vector which has entry 1 in every component. Given <sup>a</sup> <sup>∈</sup> <sup>S</sup><sup>I</sup> , we write <sup>a</sup>(i) <sup>∈</sup> <sup>S</sup> for the entry at line <sup>i</sup> <sup>∈</sup> <sup>I</sup> of <sup>a</sup>, and a = max<sup>i</sup>∈<sup>I</sup> <sup>|</sup>a(i)<sup>|</sup> for the maximum absolute value of <sup>a</sup>. Given <sup>a</sup> <sup>∈</sup> <sup>S</sup><sup>I</sup> and <sup>J</sup> <sup>⊆</sup> <sup>I</sup>, we denote by <sup>a</sup>|<sup>J</sup> <sup>∈</sup> <sup>S</sup><sup>J</sup> the restriction of <sup>a</sup> to <sup>J</sup>, i.e., we set <sup>a</sup>|<sup>J</sup> (i) = <sup>a</sup>(i) for all <sup>i</sup> <sup>∈</sup> <sup>J</sup>. Given <sup>A</sup> <sup>∈</sup> <sup>S</sup><sup>I</sup>×<sup>J</sup> , we write <sup>A</sup>(j) for the vector in column <sup>j</sup> <sup>∈</sup> <sup>J</sup> of <sup>A</sup> and <sup>A</sup>(i, j) <sup>∈</sup> <sup>S</sup> for the entry in column <sup>i</sup> <sup>∈</sup> <sup>I</sup> and row <sup>j</sup> <sup>∈</sup> <sup>J</sup> of <sup>A</sup>. Given <sup>A</sup> <sup>∈</sup> <sup>S</sup><sup>I</sup>×<sup>J</sup> and <sup>K</sup> <sup>⊆</sup> <sup>J</sup>, we denote by <sup>A</sup>|<sup>K</sup> <sup>∈</sup> <sup>S</sup><sup>I</sup>×<sup>K</sup> the restriction of <sup>A</sup> to <sup>K</sup>, i.e., we set <sup>A</sup>|K(i, j) = <sup>A</sup>(i, j) for all (i, j) ∈ I × K. We write **Id** for the square matrix which has entries 1 on the diagonal and 0 otherwise. Given a, b <sup>∈</sup> <sup>S</sup><sup>I</sup> we write <sup>a</sup>+<sup>b</sup> <sup>∈</sup> <sup>S</sup><sup>I</sup> for component-wise addition, <sup>c</sup> · <sup>a</sup> <sup>∈</sup> <sup>S</sup><sup>I</sup> for multiplying every component of <sup>a</sup> by some <sup>c</sup> <sup>∈</sup> <sup>S</sup> and <sup>a</sup> <sup>≥</sup> <sup>b</sup> for component-wise comparison. Given <sup>A</sup> <sup>∈</sup> <sup>S</sup><sup>I</sup>×<sup>J</sup> , <sup>B</sup> <sup>∈</sup> <sup>S</sup><sup>J</sup>×<sup>K</sup> and <sup>x</sup> <sup>∈</sup> <sup>S</sup><sup>J</sup> , we write AB <sup>∈</sup> <sup>S</sup><sup>I</sup>×<sup>K</sup> for the standard matrix multiplication, Ax <sup>∈</sup> <sup>S</sup><sup>I</sup> for the standard matrix-vector multiplication, <sup>A</sup><sup>T</sup> <sup>∈</sup> <sup>S</sup>J×<sup>I</sup> for the transposed matrix of <sup>A</sup> and <sup>x</sup><sup>T</sup> <sup>∈</sup> <sup>S</sup>1×<sup>J</sup> for the transposed vector of <sup>x</sup>.

*Vector Addition System with States (VASS).* Let *Var* be a finite set of variables. A vector addition system with states (VASS) V = (*St*(V), *Trns*(V)) consists of a finite set of *states St*(V) and a finite set of *transitions Trns*(V), where *Trns*(V) <sup>⊆</sup> *St*(V)×ZVar <sup>×</sup>*St*(V); we call <sup>n</sup> <sup>=</sup> <sup>|</sup>*Var* <sup>|</sup> the *dimension* of <sup>V</sup>. We write s1 d −→ s<sup>2</sup> to denote a transition (s1, d, s2) ∈ *Trns*(V); we call the vector d the *update* of transition s<sup>1</sup> d −→ s2. A *path* π of V is a finite sequence s<sup>0</sup> <sup>d</sup><sup>1</sup> −→ <sup>s</sup><sup>1</sup> <sup>d</sup><sup>2</sup> −→··· <sup>s</sup><sup>k</sup> with s<sup>i</sup> di+1 −−−→ s<sup>i</sup>+1 ∈ *Trns*(V) for all 0 ≤ i<k. We define the *length* of π by *length*(π) = k and the *value* of π by *val*(π) = - <sup>i</sup>∈[1,k] <sup>d</sup>i. Let instance(π, t) be the number of times π contains the transition t, i.e., the number of indices i such that t = s<sup>i</sup> <sup>d</sup><sup>i</sup> −→ <sup>s</sup><sup>i</sup>+1. We remark that *length*(π) = - <sup>t</sup>∈Trns(V) instance(π, t) for every path π of V. Given a finite path π<sup>1</sup> and a path π<sup>2</sup> such that the last state of π<sup>1</sup> equals the first state of π2, we write π = π1π<sup>2</sup> for the path obtained by joining the last state of π<sup>1</sup> with the first state of π2; we call π the *concatenation* of π<sup>1</sup> and π2, and π1π<sup>2</sup> a *decomposition* of π. We say π is a *sub-path* of π, if there is a decomposition π = π1π π<sup>2</sup> for some π1, π2. A *cycle* is a path that has the same start- and end-state. A *multi-cycle* is a finite set of cycles. The value *val*(M) of a multi-cycle M is the sum of the values of its cycles. V is *connected*, if for all s, s ∈ *St*(V) there is a path from s to s . VASS V is a *sub-VASS* of V, if *St*(V ) ⊆ *St*(V) and *Trns*(V ) ⊆ *Trns*(V). Sub-VASSs V<sup>1</sup> and V<sup>2</sup> are *disjoint*, if *St*(V1) ∩ *St*(V2) = ∅. A *strongly-connected component (SCC)* of a VASS V is a maximal sub-VASS S of V such that S is connected and *Trns*(S) = ∅.

Let <sup>V</sup> be a VASS. The set of *valuations Val*(V) = <sup>N</sup>Var consists of *Var* vectors over the natural numbers (we assume N includes 0). The set of *configurations Cfg*(V) = *St*(V) × *Val*(V) consists of pairs of states and valuations. <sup>A</sup> *step* is a triple ((s1, ν1), d,(s2, ν2)) <sup>∈</sup> *Cfg*(V) <sup>×</sup> <sup>Z</sup>dim(V) <sup>×</sup> *Cfg*(V) such that ν<sup>2</sup> = ν<sup>1</sup> + d and s<sup>1</sup> d −→ <sup>s</sup><sup>2</sup> <sup>∈</sup> *Trns*(V). We write (s1, ν1) <sup>d</sup> −→ (s2, ν2) to denote a step ((s1, ν1), d,(s2, ν2)) of <sup>V</sup>. A *trace* of <sup>V</sup> is a finite sequence <sup>ζ</sup> = (s0, ν0) <sup>d</sup><sup>1</sup> −→ (s1, ν1) <sup>d</sup><sup>2</sup> −→ ···(sk, νk) of steps. We lift the notions of length and instances from paths to traces in the obvious way: we consider the path π = s<sup>0</sup> d1 −→ s1 <sup>d</sup><sup>2</sup> −→ ··· <sup>s</sup><sup>k</sup> that consists of the transitions used by <sup>ζ</sup>, and set *length*(ζ) := *length*(π) and instance(ζ, t) = instance(π, t), for all <sup>t</sup> <sup>∈</sup> *Trns*(V). We denote by init(ζ) = ν0 the maximum absolute value of the starting valuation <sup>ν</sup><sup>0</sup> of ζ. We say that ζ *reaches* a valuation ν, if ν = νk. The *complexity* of V is the function *comp*<sup>V</sup> (N) = suptrace <sup>ζ</sup> of <sup>V</sup>,init(ζ)≤<sup>N</sup> *length*(ζ), which returns for every <sup>N</sup> <sup>≥</sup> 0 the supremum over the lengths of the traces <sup>ζ</sup> with init(ζ) <sup>≤</sup> <sup>N</sup>. The *variable bound* of a variable <sup>x</sup> <sup>∈</sup> *Var* is the function vboundx(N) = suptrace <sup>ζ</sup> of <sup>V</sup>,init(ζ)≤N,ζ reaches valuation <sup>ν</sup> <sup>ν</sup>(x), which returns for every <sup>N</sup> <sup>≥</sup> <sup>0</sup> the supremum over the the values of <sup>x</sup> reachable by traces <sup>ζ</sup> with init(ζ) <sup>≤</sup> <sup>N</sup>. The *transition bound* of a transition <sup>t</sup> <sup>∈</sup> *Trns*(V) is the function tboundt(N) = suptrace <sup>ζ</sup> of <sup>V</sup>,init(ζ)≤<sup>N</sup> instance(ζ, t), which returns for every <sup>N</sup> <sup>≥</sup> 0 the supremum over the number of instances of <sup>t</sup> in traces <sup>ζ</sup> with init(ζ) <sup>≤</sup> <sup>N</sup>.

*Rooted Tree.* A *rooted tree* is a connected undirected acyclic graph in which one node has been designated as the root. We will usually denote the root by ι. We note that for every node η in a rooted tree there is a unique path of η to the root. The *parent* of a node η = ι is the node connected to η on the path to the root. Node η is a *child* of a node η , if η is the parent of η. η is a *descendent* of η, if η lies on the path from η to the root; η is a *strict* descendent, if furthermore η = η . η is an *ancestor* of η , if η a descendent of η; η is a *strict* ancestor, if furthermore η = η . The *distance* of a node η to the root, is the number of nodes <sup>=</sup> <sup>η</sup> on the path from <sup>η</sup> to the root. We denote by layer(l) the set of all nodes with the same distance <sup>l</sup> to the root; we remark that layer(0) = {ι}.

All proofs are presented in the extended version [21] for space reasons.

## **3 A Dichotomy Result**

We will make use of the following matrices associated to a VASS throughout the paper: Let <sup>V</sup> be a VASS. We define the *update matrix* <sup>D</sup> <sup>∈</sup> <sup>Z</sup>Var×Trns(V) by setting D(t) = d for all transitions t = (s, d, s ) ∈ *Trns*(V). We define the *flow matrix* <sup>F</sup> <sup>∈</sup> <sup>Z</sup>St(V)×Trns(V) by setting <sup>F</sup>(s, t) = <sup>−</sup>1, <sup>F</sup>(s , t) = 1 for transitions t = (s, d, s ) with s = s, and F(s, t) = F(s , t) = 0 for transitions t = (s, d, s ) with s = s; in both cases we further set F(s, t) = 0 for all states s with s = s and s = s . We note that every column t of F either contains exactly one −1 and 1 entry (in case the source and target of transition t are different) or only 0 entries (in case the source and target of transition t are the same).

*Example 1.* We state the update and flow matrix for Vrun from Section 1:

$$D = \begin{pmatrix} -1 & 1 & -1 & 1 & 0 & 0 & 0 & 0 & -1 & 0 \\ 1 & -1 & 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 \\ -1 & 1 & 1 & -1 & -1 & -1 & 1 & 0 & 0 \end{pmatrix}, F = \begin{pmatrix} 0 \, 0 \, 0 \, 0 & 1 & -1 & 0 & 0 & -1 & 0 \\ 0 \, 0 \, 0 \, 0 & -1 & 1 & 0 & 0 & 0 & 1 \\ 0 \, 0 \, 0 \, 0 & 0 & 0 & 1 & -1 & 1 & 0 \\ 0 \, 0 \, 0 \, 0 & 0 & 0 & -1 & 1 & 0 & -1 \end{pmatrix},$$
 with column order  $s\_1 \to s\_1$ ,  $s\_2 \to s\_2$ ,  $s\_3 \to s\_3$ ,  $s\_4 \to s\_4$ ,  $s\_2 \to s\_1$ ,  $s\_1 \to s\_2$ ,  $D$ ,  $s\_4 \to s\_3$ ,  $s\_3 \to s\_4$ ,  $s\_1 \to s\_3$ ,  $s\_4 \to s\_2$  (from left to right) and row order  $x, y, z$  for  $D$  resp.  $s\_1, s\_2, s\_3, s\_4$  for  $F$  (from top to bottom).

We now consider the constraint systems (P) and (Q), stated below, which have maximization objectives. The constraint systems will be used by our main algorithm in Section 4. We observe that both constraint systems are always satisfiable (set all coefficients to zero) and that the solutions of both constraint systems are closed under addition. Hence, the number of inequalities for which

#### 628 F. Zuleger

the maximization objective is satisfied is unique for optimal solutions of both constraint systems. The maximization objectives can be implemented by suitable linear objective functions. Hence, both constraint systems can be solved in PTIME over the integers, because we can use linear programming over the rationales and then scale rational solutions to the integers by multiplying with the least common multiple of the denominators.

constraint system (P): there exists <sup>μ</sup> <sup>∈</sup> <sup>Z</sup>Trns(V) with Dμ ≥ 0 μ ≥ 0 F μ = 0 Maximization Objective: Maximize the number of inequalities with (Dμ)(x) > 0 and μ(t) > 0 constraint system (Q): there exist <sup>r</sup> <sup>∈</sup> <sup>Z</sup>Var , z <sup>∈</sup> <sup>Z</sup>St(V) with r ≥ 0 z ≥ 0 <sup>D</sup><sup>T</sup> <sup>r</sup> <sup>+</sup> <sup>F</sup> <sup>T</sup> <sup>z</sup> <sup>≤</sup> <sup>0</sup> Maximization Objective: Maximize the number of inequalities with r(x) > 0 and (D<sup>T</sup> r + F <sup>T</sup> z)(t) < 0

The solutions of (P) and (Q) are characterized by the following two lemmata:

**Lemma 2 (Cited from [14]).** <sup>μ</sup> <sup>∈</sup> <sup>Z</sup>Trns(V) *is a solution to constraint system (*P*) iff there exists a multi-cycle* M *with val*(M) ≥ 0 *and* μ(t) *instances of transition* t *for every* t ∈ *Trns*(V)*.*

**Lemma 3 (Cited from [5]**1**).** *Let* r, z *be a solution to constraint system (*Q*). Let rank*(r, z) : *Cfg*(V) <sup>→</sup> <sup>N</sup> *be the function defined by rank*(r, z)(s, ν) = <sup>r</sup><sup>T</sup> <sup>ν</sup> <sup>+</sup> z(s)*. Then, rank*(r, z) *is a* quasi-ranking function *for* V*, i.e., we have*


We now state a dichotomy between optimal solutions to constraint systems (P) and (Q), which is obtained by an application of Farkas' Lemma. This dichotomy is the main reason why we are able to compute the precise asymptotic complexity of VASSs with polynomial bounds.

<sup>1</sup> There is no explicit lemma with this statement in [5], however the lemma is implicit in the exposition of Section 4 in [5]. We further note that [5] does not include the constraint z ≥ 0. However, this difference is minor and was added in order to ensure that ranking functions always return non-negative values, which is more standard than the choice of [5]. A proof of the lemma can be found in the extended version [21].

**Lemma 4.** *Let* r *and* z *be an optimal solution to constraint system (*Q*) and let* μ *be an optimal solution to constraint system (*P*). Then, for all variables* x ∈ *Var we either have* r(x) > 0 *or* (Dμ)(x) ≥ 1*, and for all transitions* t ∈ *Trns*(V) *we either have* (D<sup>T</sup> <sup>r</sup> <sup>+</sup> <sup>F</sup> <sup>T</sup> <sup>z</sup>)(t) <sup>&</sup>lt; <sup>0</sup> *or* <sup>μ</sup>(t) <sup>≥</sup> <sup>1</sup>*.*

*Example 5.* Our main algorithm, Algorithm 1 presented in Section 4, will directly use constraint systems (P) and (Q) in its first loop iteration, and adjusted versions in later loop iterations. Here, we illustrate the first loop iteration. We consider the running example Vrun, whose update and flow matrices we have stated in Example 1. An optimal solution to constraint systems (P) and (Q) is given by μ = (1441111100)<sup>T</sup> and r = (220)<sup>T</sup> , z = (0011)<sup>T</sup> . The quasi-ranking function *rank*(r, z) immediately establishes that tboundt(N) <sup>∈</sup> <sup>O</sup>(N) for <sup>t</sup> <sup>=</sup> s<sup>1</sup> → s<sup>3</sup> and t = s<sup>4</sup> → s2, because 1) *rank*(r, z) decreases for these two transitions and does not increase for other transitions (by Lemma 3), and because 2) the initial value of *rank*(r, z) is bounded by O(N), i.e., we have *rank*(r, z)(s, ν) ∈ O(N) for every state s ∈ *St*(Vrun) and every valuation ν with ν ≤ N. By a similar argument we get vboundx(N) <sup>∈</sup> <sup>O</sup>(N) and vboundy(N) <sup>∈</sup> <sup>O</sup>(N). The exact reasoning for deriving upper bounds is given in Section 5. From μ we can, by Lemma 2, obtain the cycles C<sup>1</sup> = s<sup>1</sup> → s<sup>2</sup> → s<sup>2</sup> → s<sup>2</sup> → s<sup>2</sup> → s<sup>2</sup> → s<sup>1</sup> → s<sup>1</sup> and <sup>C</sup><sup>2</sup> <sup>=</sup> <sup>s</sup><sup>3</sup> <sup>→</sup> <sup>s</sup><sup>4</sup> <sup>→</sup> <sup>s</sup><sup>4</sup> <sup>→</sup> <sup>s</sup><sup>4</sup> <sup>→</sup> <sup>s</sup><sup>4</sup> <sup>→</sup> <sup>s</sup><sup>4</sup> <sup>→</sup> <sup>s</sup><sup>4</sup> <sup>→</sup> <sup>s</sup><sup>4</sup> with <sup>ν</sup>(C1) + <sup>ν</sup>(C2) <sup>≥</sup> (001)<sup>T</sup> (\*). We will later show that the cycles C<sup>1</sup> and C<sup>2</sup> give rise to a family of traces that establish tboundt(N) <sup>∈</sup> <sup>Ω</sup>(N<sup>2</sup>) for all transitions <sup>t</sup> <sup>∈</sup> *Trns*(Vrun) with t = s<sup>1</sup> → s<sup>3</sup> and t = s<sup>4</sup> → s2. Here we give an intuition on the construction: We consider a cycle C of Vrun that visits all states at least once. By (\*), the updates along the cycles C<sup>1</sup> and C<sup>2</sup> cancel each other out. However, the two cycles are not connected. Hence, we execute the cycle C<sup>1</sup> some Ω(N) times, then (a part of) the cycle C, then execute C<sup>2</sup> as often as C1, and finally the remaining part of C; this we repeat Ω(N) times. This construction also establishes the bound vboundz(N) <sup>∈</sup> <sup>Ω</sup>(N<sup>2</sup>) because, by (\*), we increase <sup>z</sup> with every joint execution of C<sup>1</sup> and C2. The precise lower bound construction is given in Section 6.

## **4 Main Algorithm**

Our main algorithm – Algorithm 1 – computes the complexity as well as variable and transition bounds of an input VASS V, either detecting that V has at least exponential complexity or reporting precise asymptotic bounds for the transitions and variables of V (up to a constant factor): Algorithm 1 will compute values vExp(x) <sup>∈</sup> <sup>N</sup> such that vbound<sup>N</sup> (x) <sup>∈</sup> <sup>Θ</sup>(NvExp(x) ) for every x ∈ *Var* and values tExp(t) <sup>∈</sup> <sup>N</sup> such that tbound<sup>N</sup> (t) <sup>∈</sup> <sup>Θ</sup>(NtExp(t) ) for every t ∈ *Trns*(V).

*Data Structures.* The algorithm maintains a rooted tree T. Every node η of T will always be labelled by a sub-VASSs VASS(η) of <sup>V</sup>. The nodes in the same layer of T will always be labelled by disjoint sub-VASS of V. The main loop of Algorithm 1 will extend T by one layer per loop iteration. The variable l always contains the next layer that is going to be added to T. For computing variable and transition bounds, Algorithm 1 maintains the functions vExp : *Var* <sup>→</sup> <sup>N</sup> ∪ {∞} and tExp : *Trns*(V) <sup>→</sup> <sup>N</sup> ∪ {∞}.

*Initialization.* We assume D to be the update matrix and F to be the flow matrix associated to V as discussed in Section 3. At initialization, T consists of the root node <sup>ι</sup> and we set VASS(ι) = <sup>V</sup>, i.e., the root is labelled by the input <sup>V</sup>. We initialize l = 1 as Algorithm 1 is going to add layer 1 to T in the first loop iteration. We initialize vExp(x) = <sup>∞</sup> for all variables <sup>x</sup> <sup>∈</sup> *Var* and tExp(t) = <sup>∞</sup> for all transitions t ∈ *Trns*(V).

*The constraint systems solved during each loop iteration.* In loop iteration l, Algorithm 1 will set tExp(t) := l for some transitions t and vExp(x) := l for some variables x. In order to determine those transitions and variables, Algorithm 1 instantiates constraint systems (P) and (Q) from Section 3 over the set of transitions U = <sup>η</sup>∈layer(l−1) *Trns*(VASS(η)), which contains all transitions associated to nodes in layer l −1 of T. However, instead of a direct instantiation using D|<sup>U</sup> and F|<sup>U</sup> (i.e., the restriction of D and F to the transitions U), we need to work with an extended set of variables and an extended update matrix. We set *Var* ext := {(x, η) <sup>|</sup> <sup>η</sup> <sup>∈</sup> layer(<sup>l</sup> <sup>−</sup> vExp(x))}, where we set <sup>n</sup> − ∞ = 0 for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>. This means that we use a different copy of variable <sup>x</sup> for every node <sup>η</sup> in layer <sup>l</sup> <sup>−</sup> vExp(x). We note that for a variable <sup>x</sup> with vExp(x) = <sup>∞</sup> there is only a single copy of <sup>x</sup> in *Var* ext because <sup>ι</sup> <sup>∈</sup> layer(0) is the only node in layer 0. We define the extended update matrix <sup>D</sup>ext <sup>∈</sup> <sup>Z</sup>Var ext×<sup>U</sup> by setting

$$D\_{ext}((x,\eta),t) := \begin{cases} D(x,t), \text{ if } t \in Trns(\mathsf{VASS}(\eta)),\\ 0, & \text{otherwise.} \end{cases}$$

Constraint systems (*I*) and (*II*) stated in Figure 2 can be recognized as instantiation of constraint systems (P) and (Q) with matrices Dext and F|<sup>U</sup> and variables *Var* ext, and hence the dichotomy stated in Lemma 4 holds.

We comment on the choice of *Var* ext: Setting *Var* ext <sup>=</sup> {(x, η) <sup>|</sup> <sup>η</sup> <sup>∈</sup> layer(i)} for any <sup>i</sup> <sup>≤</sup> <sup>l</sup> <sup>−</sup> vExp(x) would result in correct upper bounds (while i>l <sup>−</sup> vExp(x) would not). However, choosing i<l <sup>−</sup> vExp(x) does in general result in sub-optimal bounds because fewer variables make constraint system (*I*) easier and constraint system (*II*) harder to satisfy (in terms of their maximization objectives). In fact, <sup>i</sup> <sup>=</sup> <sup>l</sup> <sup>−</sup> vExp(x) is the optimal choice, because this choice allows us to prove corresponding lower bounds in Section 6. We will further comment on key properties of constraint systems (*I*) and (*II*) in Sections 5 and 6, when we outline the proofs of the upper resp. lower bound.

We note that Algorithm 1 does not use the optimal solution μ to constraint system (*I*) for the computation of the vExp(x) and tExp(t), and hence the computation of the optimal solution μ could be removed from the algorithm. The solution μ is however needed for the extraction of lower bounds in Sections 6 and 8, and this is the reason why it is stated here. The extraction of lower bounds is not explicitly added to the algorithm in order to not clutter the presentation.

*Discovering transition bounds.* After an optimal solution r, z to constraint system (*II*) has been found, Algorithm 1 collects all transitions t with (D<sup>T</sup> ext r + F| T <sup>U</sup> z)(t) < 0 in the set R (note that the optimization criterion in constraint system (*II*) tries to find as many such t as possible). Algorithm 1 then sets tExp(t) := <sup>l</sup> for all <sup>t</sup> <sup>∈</sup> <sup>R</sup>. The transitions in <sup>R</sup> will not be part of layer <sup>l</sup> of <sup>T</sup>.

**Input:** a connected VASS V with update matrix D and flow matrix F T := single root node ι with VASS(ι) = V; l := 1; vExp(x) := ∞ for all variables x ∈ Var ; tExp(t) := ∞ for all transitions t ∈ Trns(V); **repeat** let U := <sup>η</sup>∈layer(l−1) Trns(VASS(η)); let Var ext := {(x, η) <sup>|</sup> <sup>η</sup> <sup>∈</sup> layer(<sup>l</sup> <sup>−</sup> vExp(x))}, where <sup>n</sup> − ∞ = 0 for <sup>n</sup> <sup>∈</sup> <sup>N</sup>; let <sup>D</sup>ext <sup>∈</sup> <sup>Z</sup>Varext <sup>×</sup><sup>U</sup> be the matrix defined by <sup>D</sup>ext((x, η), t) = <sup>D</sup>(x, t), if <sup>t</sup> <sup>∈</sup> Trns(VASS(η)) <sup>0</sup>, otherwise ; find optimal solutions μ and r, z to constraint systems (I) and (II); let <sup>R</sup> := {<sup>t</sup> <sup>∈</sup> <sup>U</sup> <sup>|</sup> (D<sup>T</sup> ext r + F| T <sup>U</sup> z)(t) < 0}; set tExp(t) := l for all t ∈ R; **foreach** η ∈ layer(l − 1) **do** let V := VASS(η) be the VASS associated to η; decompose (St(V ), Trns(V ) \ R) into SCCs; **foreach** SCC S of (St(V ), Trns(V ) \ R) **do** create a child η of η with VASS(η ) = S; **foreach** x ∈ Var with vExp(x) = ∞ **do if** r(x, ι) > 0 **then** set vExp(x) := l ; **if** there are no x ∈ Var , t ∈ Trns(V) with l < vExp(x) + tExp(t) < ∞ **then return** "V has at least exponential complexity" l := l + 1; **until** vExp(x) = ∞ and tExp(t) = ∞ for all x ∈ Var and t ∈ Trns(V);

**Algorithm 1:** Computes transition and variable bounds for a VASS V

constraint system (I): there exists <sup>μ</sup> <sup>∈</sup> <sup>Z</sup><sup>U</sup> with Dextμ ≥ 0 μ ≥ 0 F|<sup>U</sup> μ = 0 Maximization Objective: Maximize the number of inequalities with (Dextμ)(x) > 0 and μ(t) > 0 constraint system (II): there exist <sup>r</sup> <sup>∈</sup> <sup>Z</sup>Varext , z <sup>∈</sup> <sup>Z</sup>St(V) with r ≥ 0 z ≥ 0 D<sup>T</sup> ext r + F| T <sup>U</sup> z ≤ 0 Maximization Objective: Maximize the number of inequalities with r(x, η) > 0 and (D<sup>T</sup> ext r + F| T <sup>U</sup> z)(t) < 0

**Fig. 2.** Constraint Systems (I) and (II) used by Algorithm 1

*Construction of the next layer in* T*.* For each node η in layer l − 1, Algorithm 1 will create children by removing the transitions in R. This is done as follows: Given a node <sup>η</sup> in layer <sup>l</sup> <sup>−</sup> 1, Algorithm 1 considers the VASS <sup>V</sup> <sup>=</sup> VASS(η) associated to η. Then, (*St*(V ), *Trns*(V )\R) is decomposed into its SCCs. Finally, for each SCC S of (*St*(V ), *Trns*(V )\R) a child <sup>η</sup> of <sup>η</sup> is created with VASS(η ) = S. Clearly, the new nodes in layer l are labelled by disjoint sub-VASS of V.

*The transitions of the next layer.* The following lemma states that the new layer l of T contains all transitions of layer l − 1 except for the transitions R; the lemma is due to the fact that every transition in U \ R belongs to a cycle and hence to some SCC that is part of the new layer l.

**Lemma 6.** *We consider the new layer constructed during loop iteration* l *of Algorithm 1: we have* U \ R = <sup>η</sup>∈layer(l) *Trns*(VASS(η))*.*

*Discovering variable bounds.* For each <sup>x</sup> <sup>∈</sup> *Var* with vExp(x) = <sup>∞</sup>, Algorithm 1 checks whether r(x, ι) > 0 (we point out that the optimization criterion in constraint systems (*II*) tries to find as many such x with r(x, ι) > 0 as possible). Algorithm 1 then sets vExp(x) := l for all those variables.

*The check for exponential complexity.* In each loop iteration, Algorithm 1 checks whether there are <sup>x</sup> <sup>∈</sup> *Var* , <sup>t</sup> <sup>∈</sup> *Trns*(V) with l < vExp(x) + tExp(t) <sup>&</sup>lt; <sup>∞</sup>. If this is not the case, then we can conclude that V is at least exponential (see Theorem 9 below). If the check fails, Algorithm 1 increments l and continues with the construction of the next layer in the next loop iteration.

*Termination criterion.* The algorithm proceeds until either exponential complexity has been detected or until vExp(x) <sup>=</sup> <sup>∞</sup> and tExp(t) <sup>=</sup> <sup>∞</sup> for all <sup>x</sup> <sup>∈</sup> *Var* and t ∈ *Trns*(V) (i.e., bounds have been computed for all variables and transitions).

*Invariants.* We now state some simple invariants maintained by Algorithm 1, which are easy to verify:


*Example 7.* We sketch the execution of Algorithm 1 on Vrun. In iteration l = 1, we have *Var* ext = {(x, ι),(y, ι),(z, ι)}, and thus matrix Dext is identical to the matrix D. Hence, constraint systems (*I*) and (*II*) are identical to constraint systems (P) and (Q), whose optimal solutions μ = (1441111100)<sup>T</sup> and r = (220)<sup>T</sup> , <sup>z</sup> = (0011)<sup>T</sup> we have discussed in Example 5. Algorithm 1 then sets tExp(s<sup>1</sup> <sup>→</sup> <sup>s</sup>3) = 1 and tExp(s<sup>4</sup> <sup>→</sup> <sup>s</sup>2) = 1, creates two children <sup>η</sup><sup>A</sup> and <sup>η</sup><sup>B</sup> of <sup>ι</sup> labeled by V<sup>A</sup> = ({s1, s2}, {s<sup>1</sup> → s1, s<sup>1</sup> → s2, s<sup>2</sup> → s2, s<sup>2</sup> → s1}) and V<sup>B</sup> = ({s3, s4}, {s<sup>3</sup> → <sup>s</sup>3, s<sup>3</sup> <sup>→</sup> <sup>s</sup>4, s<sup>4</sup> <sup>→</sup> <sup>s</sup>4, s<sup>4</sup> <sup>→</sup> <sup>s</sup>3}), and sets vExp(x) = 1 and vExp(y) = 1. In iteration l = 2, we have *Var* ext = {(x, ηA),(y, ηA),(x, ηB),(y, ηB),(z, ι)} and the matrix Dext stated in Figure 3. Algorithm 1 obtains μ = (11110000)<sup>T</sup> and r = (12211)<sup>T</sup> , z = (0000)<sup>T</sup> as optimal solutions to (*I*) and (*II*). Algorithm 1 then

$$D\_{ext} = \begin{pmatrix} -1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & -1 & 0 & 0 & 0 & 0 \\ -1 & 1 & 1 & -1 & -1 & -1 & -1 & -1 \\ -1 & 1 & 1 & -1 & -1 & -1 & -1 & -1 \\ \end{pmatrix} \bigg|\_{\text{Cart}} = \begin{pmatrix} -1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & -1 \\ 0 & 0 & 0 & -1 \\ -1 & 1 & 0 & 0 \\ 0 & 0 & 1 & -1 \\ \end{pmatrix} \begin{pmatrix} \text{with column order} \\ \text{with column order} \\ s\_1 \to s\_1, s\_2 \to s\_2, \\ s\_1 \to s\_1, s\_2 \to s\_2, \\ \text{and} \\ \text{from} \\ \text{(from)} \end{pmatrix}$$

**Fig. 3.** The extended update matrices during iteration l = 2 (left) and l = 3 (right) of Algorithm 1 on the running example Vrun from Section 1.

sets tExp(s<sup>1</sup> <sup>→</sup> <sup>s</sup>2) = tExp(s<sup>2</sup> <sup>→</sup> <sup>s</sup>1) = tExp(s<sup>3</sup> <sup>→</sup> <sup>s</sup>4) = tExp(s<sup>4</sup> <sup>→</sup> <sup>s</sup>3) = 2, creates the children η1, η<sup>2</sup> resp. η3, η<sup>4</sup> of η<sup>A</sup> resp. η<sup>B</sup> with η<sup>i</sup> labelled by V<sup>i</sup> = ({si}, {s<sup>i</sup> <sup>→</sup> <sup>s</sup>i}), and sets vExp(z) = 2. In iteration <sup>l</sup> = 3, we have *Var* ext <sup>=</sup> {(x, η1),(y, η1),(x, η2),(y, η2),(x, η3), (y, η3),(x, η4),(y, η4),(z,ηA),(z,ηB)} and the matrix Dext stated in Figure 3. Algorithm 1 obtains μ = (0000)<sup>T</sup> and r = (1113311111)<sup>T</sup> , z = (0000)<sup>T</sup> as optimal solutions to (*I*) and (*II*). Algorithm 1 then sets tExp(s<sup>i</sup> <sup>→</sup> <sup>s</sup>i) = 3, for all <sup>i</sup>, and terminates.

We now state the main properties of Algorithm 1:

**Lemma 8.** *Algorithm 1 always terminates.*

**Theorem 9.** *If Algorithm 1 returns "*V *has at least exponential complexity", then comp*<sup>V</sup> (N) <sup>∈</sup> <sup>2</sup><sup>Ω</sup>(N) *, and we have* tboundt(N) <sup>∈</sup> <sup>2</sup><sup>Ω</sup>(N) *for all* <sup>t</sup> <sup>∈</sup> *Trns*(V) *with* tExp(t) = <sup>∞</sup> *and* vboundt(N) <sup>∈</sup> <sup>2</sup><sup>Ω</sup>(N) *for all* <sup>x</sup> <sup>∈</sup> *Var with* vExp(x) = <sup>∞</sup>*.*

The proof of Theorem 9 is stated in Section 8. We now assume that Algorithm 1 does not return "V has at least exponential complexity". Then, Algorithm 1 must terminate with tExp(t) <sup>=</sup> <sup>∞</sup> and vExp(x) <sup>=</sup> <sup>∞</sup> for all <sup>t</sup> <sup>∈</sup> *Trns*(V) and <sup>x</sup> <sup>∈</sup> *Var* . The following result states that tExp and vExp contain the precise exponents of the asymptotic transition and variable bounds of V:

**Theorem 10.** vbound<sup>N</sup> (x) <sup>∈</sup> <sup>Θ</sup>(NvExp(x) ) *for all* <sup>x</sup> <sup>∈</sup> *Var and* tbound<sup>N</sup> (t) <sup>∈</sup> <sup>Θ</sup>(NtExp(t)) *for all* <sup>t</sup> <sup>∈</sup> *Trns*(V)*.*

The upper bounds of Theorem 10 will be proved in Section 5 (Theorem 16) and the lower bounds in Section 6 (Corollary 20).

We will prove in Section 7 that the exponents of the variable and transition bounds are bounded exponentially in the dimension of V:

**Theorem 11.** *We have* vExp(x) <sup>≤</sup> <sup>2</sup>|Var<sup>|</sup> *for all* <sup>x</sup> <sup>∈</sup> *Var and* tExp(t) <sup>≤</sup> <sup>2</sup>|Var<sup>|</sup> *for all* t ∈ *Trns*(V)*.*

Finally, we obtain the following corollary from Theorems 10 and 11:

**Corollary 12.** *Let* <sup>V</sup> *be a connected VASS. Then, either comp*<sup>V</sup> (N) <sup>∈</sup> <sup>2</sup><sup>Ω</sup>(N) *or comp*<sup>V</sup> (N) <sup>∈</sup> <sup>Θ</sup>(N<sup>i</sup> ) *for some computable* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>2</sup>|Var<sup>|</sup> *.*

### **4.1 Complexity of Algorithm 1**

In the remainder of this section we will establish the following result:

**Theorem 13.** *Algorithm 1 (with the below stated optimization) can be implemented in polynomial time with regard to the size of the input VASS* V*.*

We will argue that A) every loop iteration of Algorithm 1 only takes polynomial time, and B) that polynomially many loop iterations are sufficient (this only holds for the optimization of the algorithm discussed below).

Let V be a VASS, let m = |*Trns*(V)| be the number of transitions of V, and let <sup>n</sup> <sup>=</sup> <sup>|</sup>*Var* <sup>|</sup> be the dimension of <sup>V</sup>. We note that <sup>|</sup>layer(l)| ≤ <sup>m</sup> for every layer l of T, because the VASSs of the nodes in the same layer are disjoint.

A) Clearly, removing the decreasing transitions and computing the strongly connected components can be done in polynomial time. It remains to argue about constraint systems (*I*) and (*II*). We observe that |*Var* ext| = |{(x, η) | <sup>η</sup> <sup>∈</sup> layer(<sup>l</sup> <sup>−</sup> vExp(x))}| ≤ <sup>n</sup> · <sup>m</sup> and <sup>|</sup>U| ≤ <sup>m</sup>. Hence the size of constraint systems (*I*) and (*II*) is polynomial in the size of V. Moreover, constraint systems (*I*) and (*II*) can be solved in PTIME as noted in Section 3.

B) We do not a-priori have a bound on the number of iterations of the main loop of Algorithm 1. (Theorem 11 implies that the number of iterations is at most exponential; however, we do not use this result here). We will shortly state an improvement of Algorithm 1 that ensures that polynomially many iterations are sufficient. The underlying insight is that certain layers of the tree do not need to be constructed explicitly. This insight is stated in the lemma below:

**Lemma 14.** *We consider the point in time when the execution of Algorithm 1 reaches line* l := l + 1 *during some loop iteration* l ≥ 1*. Let RelevantLayers* = {tExp(t) + vExp(x) <sup>|</sup> <sup>x</sup> <sup>∈</sup> *Var* , t <sup>∈</sup> *Trns*(V)} *and let* <sup>l</sup> = min{l | l > l,l ∈ *RelevantLayers*}*. Then,* vExp(x) <sup>=</sup> <sup>i</sup> *and* tExp(t) <sup>=</sup> <sup>i</sup> *for all* <sup>x</sup> <sup>∈</sup> *Var ,* <sup>t</sup> <sup>∈</sup> *Trns*(V) *and* l<i<l *.*

We now present the optimization that achieves polynomially many loop iterations. We replace the line <sup>l</sup> := <sup>l</sup>+1 by the two lines *RelevantLayers* := {tExp(t)+ vExp(x) <sup>|</sup> <sup>x</sup> <sup>∈</sup> *Var* , t <sup>∈</sup> *Trns*(V)} and <sup>l</sup> := min{<sup>l</sup> | l > l,l ∈ *RelevantLayers*}. The effect of these two lines is that Algorithm 1 directly skips to the next relevant layer. Lemma 14, stated above, justifies this optimization: First, no new variable or transition bound is discovered in the intermediate layers l<i<l . Second, each intermediate layer l<i<l has the same number of nodes as layer l, which are labelled by the same sub-VASSs as the nodes in l (otherwise there would be a transition with transition bound l<i<l ); hence, whenever needed, Algorithm 1 can construct a missing layer l<i<l on-the-fly from layer l.

We now analyze the number of loop iterations of the optimized algorithm. We recall that the value of each vExp(x) and tExp(t) is changed at most once from ∞ to some value = ∞. Hence, Algorithm 1 encounters at most n · m different values in the set *RelevantLayers* <sup>=</sup> {tExp(t) + vExp(x) <sup>|</sup> <sup>x</sup> <sup>∈</sup> *Var* , t <sup>∈</sup> *Trns*(V)} during execution. Thus, the number of loop iterations is bounded by n · m.

## **5 Proof of the Upper Bound Theorem**

We begin by stating a proof principle for obtaining upper bounds.

**Proposition 15 (Bound Proof Principle).** *Let* V *be a VASS. Let* U ⊆ *Trns*(V) *be a subset of the transitions of* <sup>V</sup>*. Let* <sup>w</sup> : *Cfg*(V) <sup>→</sup> <sup>N</sup> *and* inc<sup>t</sup> : <sup>N</sup> <sup>→</sup> <sup>N</sup>*, for every* <sup>t</sup> <sup>∈</sup> *Trns*(V) \ <sup>U</sup>*, be functions such that for every trace* <sup>ζ</sup> = (s0, ν0) <sup>d</sup><sup>1</sup> −→ (s1, ν1) <sup>d</sup><sup>2</sup> −→ ··· *of* <sup>V</sup> *with* init(ζ) <sup>≤</sup> <sup>N</sup> *we have for every* i ≥ 0 *that*

*1)* s<sup>i</sup> <sup>d</sup><sup>i</sup> −→ <sup>s</sup><sup>i</sup>+1 <sup>∈</sup> <sup>U</sup> *implies* <sup>w</sup>(si, νi) <sup>≥</sup> <sup>w</sup>(s<sup>i</sup>+1, ν<sup>i</sup>+1)*, and 2)* s<sup>i</sup> <sup>d</sup><sup>i</sup> −→ <sup>s</sup><sup>i</sup>+1 <sup>∈</sup> *Trns*(V) \ <sup>U</sup> *implies* <sup>w</sup>(si, νi) + inct(N) <sup>≥</sup> <sup>w</sup>(s<sup>i</sup>+1, ν<sup>i</sup>+1)*.*

*We call such a function* w *a* complexity witness *and the associated* inc<sup>t</sup> *functions the* increase certificates*.*

*Let* t ∈ U *be a transition on which* w decreases*, i.e., we have* w(s1, ν1) ≥ <sup>w</sup>(s2, ν2) <sup>−</sup> <sup>1</sup> *for every step* (s1, ν1) <sup>d</sup> −→ (s2, ν2) *of* V *with* t = s<sup>1</sup> d −→ s2*. Then,*

$$\mathsf{tfbound}\_{\ell}(N) \le \max\_{(s,\nu) \in C \not\!g(\mathcal{V}), \|\nu\| \le N} w(s,\nu) + \sum\_{t' \in Trns(\mathcal{V}) \backslash U} \mathsf{tbound}\_{t'}(N) \cdot \mathsf{inc}\_{t'}(N).$$

*Further, let* x ∈ *Var be a variable such that* ν(x) ≤ w(s, ν) *for all* (s, ν) ∈ *Cfg*(V)*. Then,*

$$\mathsf{tvbound}\_{x}(N) \leq \max\_{(s,\nu) \in C\!\!f\!g(\mathcal{V}), \|\nu\| \leq N} w(s,\nu) + \sum\_{t' \in Tns(\mathcal{V}) \cup U} \mathsf{tbound}\_{t'}(N) \cdot \mathsf{inv}\_{t'}(N).$$

*Proof Outline of the Upper Bound Theorem.* Let V be a VASS for which Algorithm 1 does not report exponential complexity. We will prove by induction on loop iteration <sup>l</sup> that vbound<sup>N</sup> (x) <sup>∈</sup> <sup>O</sup>(N<sup>l</sup> ) for every <sup>x</sup> <sup>∈</sup> *Var* with vExp(x) = <sup>l</sup> and that tbound<sup>N</sup> (t) <sup>∈</sup> <sup>O</sup>(N<sup>l</sup> ) for every <sup>t</sup> <sup>∈</sup> *Trns*(V) with tExp(t) = <sup>l</sup>.

We now consider some loop iteration l ≥ 1. Let U = <sup>η</sup>∈layer(l−1) *Trns*(VASS(η)) be the transitions, *Var* ext be the set of extended variables and <sup>D</sup>ext <sup>∈</sup> <sup>Z</sup>Var ext×<sup>U</sup> be the update matrix considered by Algorithm 1 during loop iteration l. Let r, z be some optimal solution to constraint system (*II*) computed by Algorithm 1 during loop iteration l. The main idea for the upper bound proof is to use the quasi-ranking function from Lemma 3 as witness function for the Bound Proof Principle. In order to apply Lemma 3 we need to consider the VASS associated to the matrices in constraint system (*II*): Let Vext be the VASS over variables *Var* ext associated to update matrix Dext and flow matrix F|<sup>U</sup> . From Lemma 3 we get that *rank*(r, z) : *Cfg*(Vext) <sup>→</sup> <sup>N</sup> is a quasi-ranking function for <sup>V</sup>ext. We now need to relate V to the extended VASS Vext in order to be able to use this quasi-ranking function. We do so by extending valuations over *Var* to valuations over *Var* ext. For every state <sup>s</sup> <sup>∈</sup> *St*(V) and valuation <sup>ν</sup> : *Var* <sup>→</sup> <sup>N</sup>, we define the *extended valuation* exts(ν) : *Var* ext <sup>→</sup> <sup>N</sup> by setting

$$\mathsf{next}\_s(\nu)(x,\eta) = \begin{cases} \nu(x), \text{ if } s \in St(\mathsf{VASS}(\eta)),\\ 0, & \text{otherwise.} \end{cases}$$

As a direct consequence from the definition of extended valuations, we have that (s, exts(ν)) <sup>∈</sup> *Cfg*(Vext) for all (s, ν) <sup>∈</sup> *Cfg*(V), and that (s1, ext<sup>s</sup><sup>1</sup> (ν1)) <sup>D</sup>ext (t) −−−−→ (s2, ext<sup>s</sup><sup>2</sup> (ν2)) is a step of <sup>V</sup>ext for every step (s1, ν1) <sup>d</sup> −→ (s2, ν2) of V with s1 d −→ s<sup>2</sup> ∈ U. We now define the witness function w by setting

<sup>w</sup>(s, ν) = *rank*(r, z)(s, exts(ν)) for all (s, ν) <sup>∈</sup> *Cfg*(V).

We immediately get from Lemma 3 that w maps configurations to the nonnegative integers and that condition 1) of the Bound Proof Principle is satisfied. Indeed, we get from the first item of Lemma 3 that w(s, ν) ≥ 0 for all (s, ν) ∈ *Cfg*(V), and from the second item that w(s1, ν1) ≥ w(s2, ν2) for every step (s1, ν1) <sup>d</sup> −→ (s2, ν2) of V with t = s<sup>1</sup> d −→ s<sup>2</sup> ∈ U; moreover, the inequality is strict if (D<sup>T</sup> ext r+F| T <sup>U</sup> z)(t) < 0, i.e., the witness function w decreases for transitions t with tExp(t) = l. It remains to establish condition 2) of the Bound Proof Principle. We will argue that we can find increase certificates inct(N) <sup>∈</sup> <sup>O</sup>(N<sup>l</sup>−tExp(t) ) for all <sup>t</sup> <sup>∈</sup> *Trns*(V) \U. We note that tExp(t) < l for all <sup>t</sup> <sup>∈</sup> *Trns*(V) \U, and hence the induction assumption can be applied for such t. We can then derive the desired bounds from the Bound Proof Principle because of - <sup>t</sup>∈Trns(V)\<sup>U</sup> tboundt(N) · inct(N) = - <sup>t</sup>∈Trns(V)\<sup>U</sup> <sup>O</sup>(NtExp(t) ) · <sup>O</sup>(N<sup>l</sup>−tExp(t)) = <sup>O</sup>(N<sup>l</sup> ).

**Theorem 16.** vbound<sup>N</sup> (x) <sup>∈</sup> <sup>O</sup>(NvExp(x) ) *for all* <sup>x</sup> <sup>∈</sup> *Var and* tbound<sup>N</sup> (t) <sup>∈</sup> O(NtExp(t) ) *for all* t ∈ *Trns*(V)*.*

## **6 Proof of the Lower Bound Theorem**

The following lemma will allow us to consider traces <sup>ζ</sup><sup>N</sup> with init(ζ<sup>N</sup> ) <sup>∈</sup> <sup>O</sup>(N) instead of init(ζ<sup>N</sup> ) <sup>≤</sup> <sup>N</sup> when proving asymptotic lower bounds.

**Lemma 17.** *Let* V *be a VASS, let* t ∈ *Trns*(V) *be a transition and let* x ∈ *Var be a variable. If there are traces* <sup>ζ</sup><sup>N</sup> *with* init(ζ<sup>N</sup> ) <sup>∈</sup> <sup>O</sup>(N) *and* instance(ζ<sup>N</sup> , t) <sup>≥</sup> N<sup>i</sup> *, then* tbound<sup>N</sup> (t) <sup>∈</sup> <sup>Ω</sup>(N<sup>i</sup> )*. If there are traces* <sup>ζ</sup><sup>N</sup> *with* init(ζ<sup>N</sup> ) <sup>∈</sup> <sup>O</sup>(N) *that reach a final valuation* <sup>ν</sup> *with* <sup>ν</sup>(x) <sup>≥</sup> <sup>N</sup><sup>i</sup> *, then* vbound<sup>N</sup> (x) <sup>∈</sup> <sup>Ω</sup>(N<sup>i</sup> )*.*

The lower bound proof uses the notion of a *pre-path*, which relaxes the notion of a path: A pre-path σ = t<sup>1</sup> ···t<sup>k</sup> is a finite sequence of transitions t<sup>i</sup> = s<sup>i</sup> <sup>d</sup><sup>i</sup> −→ <sup>s</sup> i. Note that we do not require for subsequent transitions that the end state of one transition is the start state of the next transition, i.e., we do not require s <sup>i</sup> = s<sup>i</sup>+1. We generalize notions from paths to pre-paths in the obvious way, e.g., we set *val*(σ) = - <sup>i</sup>∈[1,k] <sup>d</sup><sup>i</sup> and denote by instance(σ, t), for <sup>t</sup> <sup>∈</sup> *Trns*(V), the number of times σ contains the transition t. We say the pre-path σ *can be executed from valuation* ν, if there are valuations ν<sup>i</sup> ≥ 0 with ν<sup>i</sup>+1 = ν<sup>i</sup> + d<sup>i</sup>+1 for all 0 ≤ i<k and ν = ν0; we further say that σ *reaches* valuation ν , if ν = νk. We will need the following relationship between execution and traces: in case a pre-path σ is actually a path, σ can be executed from valuation ν, if and only if there is a trace with initial valuation ν that uses the same sequence of transitions as σ. Two pre-paths σ = t<sup>1</sup> ···t<sup>k</sup> and σ = t <sup>1</sup> ···t <sup>l</sup> can be *shuffled* into a pre-path σ = t <sup>1</sup> ···t <sup>k</sup>+<sup>l</sup>, if σ is an order-preserving interleaving of σ and σ ; formally, there are injective monotone functions f : [1, k] → [1, k + l] and g : [1, l] → [1, k + l] with f([1, k]) ∩ g([1, l]) = ∅ such that t <sup>f</sup>(i) = t<sup>i</sup> for all i ∈ [1, k] and t <sup>g</sup>(i) = t <sup>i</sup> for all i ∈ [1, l]. Further, for d ≥ 1 and pre-path σ, we denote by <sup>σ</sup><sup>d</sup> <sup>=</sup> σσ ··· <sup>σ</sup> <sup>d</sup> the pre-path that consists of d subsequent copies of σ.

For the remainder of this section, we fix a VASS V for which Algorithm 1 does not report exponential complexity and we fix the computed tree T and bounds vExp, tExp. We further need to use the solutions to constraint system (*I*) computed during the run of Algorithm 1: For every layer <sup>l</sup> <sup>≥</sup> 1 and node <sup>η</sup> <sup>∈</sup> layer(l), we fix a cycle <sup>C</sup>(η) that contains <sup>μ</sup>(t) instances of every <sup>t</sup> <sup>∈</sup> *Trns*(VASS(η)), where μ is an optimal solution to constraint system (*I*) during loop iteration l. The existence of such cycles is stated in Lemma 18 below. We note that this definition ensures *val*(C(η)) = - <sup>t</sup>∈Trns(VASS(η)) <sup>D</sup>(t)· <sup>μ</sup>(t). Further, for the root node <sup>ι</sup>, we fix an arbitrary cycle C(ι) that uses all transitions of V at least once.

**Lemma 18.** *Let* μ *be an optimal solution to constraint system (I ) during loop iteration* <sup>l</sup> *of Algorithm 1. Then there is a cycle* <sup>C</sup>(η) *for every* <sup>η</sup> <sup>∈</sup> layer(l) *that contains exactly* <sup>μ</sup>(t) *instances of every transition* <sup>t</sup> <sup>∈</sup> *Trns*(VASS(η))*.*

*Proof Outline of the Lower Bound Theorem.*

**Step I)** We define a pre-path τl, for every l ≥ 1, with the following properties:

	- a) <sup>ν</sup>(x) <sup>∈</sup> <sup>O</sup>(NvExp(x) ) for <sup>x</sup> <sup>∈</sup> *Var* with vExp(x) <sup>≤</sup> <sup>l</sup>, and
	- b) <sup>ν</sup>(x) <sup>∈</sup> <sup>O</sup>(N<sup>l</sup> ) for <sup>x</sup> <sup>∈</sup> *Var* with vExp(x) <sup>≥</sup> <sup>l</sup> + 1.

The difficulty in the construction of the pre-paths τ<sup>l</sup> lies in ensuring Property 5). The construction of the τ<sup>l</sup> proceeds along the tree T using that the cycles C(η) have been obtained according to solutions of constraint system (*I*).

**Step II)** It is now a direct consequence of Properties 3)-5) stated above that we can choose a sufficiently large k > 0 such that for every l ≥ 0 the pre-path ρ<sup>l</sup> = τ <sup>k</sup> <sup>0</sup> τ <sup>k</sup> <sup>1</sup> ··· <sup>τ</sup> <sup>k</sup> <sup>l</sup> (the concatenation of k copies of each τi, setting τ<sup>0</sup> = C(ι)<sup>N</sup> ), can be executed from some valuation ν and reaches a valuation ν with

1) ν ∈ O(N), 2) ν (x) <sup>≥</sup> kNvExp(x) for all <sup>x</sup> <sup>∈</sup> *Var* with vExp(x) <sup>≤</sup> <sup>l</sup>, and 3) ν (x) <sup>≥</sup> kN<sup>l</sup>+1 for all <sup>x</sup> <sup>∈</sup> *Var* with vExp(x) <sup>≥</sup> <sup>l</sup> + 1.

The above stated properties for the pre-path ρ<sup>l</sup>max , where lmax is the maximal layer of T, would be sufficient to conclude the lower bound proof except that we need to extend the proof from pre-paths to proper paths.

**Step III)** In order to extend the proof from pre-paths to paths we make use of the concept of shuffling. For all l ≥ 0, we will define paths γ<sup>l</sup> that can be obtained by shuffling the pre-paths ρ0, ρ1,...,ρl. The path γ<sup>l</sup>max , where lmax is the maximal layer of T, then has the desired properties and allows to conclude the lower bound proof with the following result:

**Theorem 19.** *There are traces* <sup>ζ</sup><sup>N</sup> *with* init(ζ<sup>N</sup> ) <sup>∈</sup> <sup>O</sup>(N) *such that* <sup>ζ</sup><sup>N</sup> *ends in configuration* (s<sup>N</sup> , ν<sup>N</sup> ) *with* <sup>ν</sup><sup>N</sup> (x) <sup>≥</sup> <sup>N</sup>vExp(x) *for all variables* <sup>x</sup> <sup>∈</sup> *Var and we have* instance(ζ<sup>N</sup> , t) <sup>≥</sup> <sup>N</sup>tExp(t) *for all transitions* <sup>t</sup> <sup>∈</sup> *Trns*(V)*.*

With Lemma 17 we get the desired lower bounds from Theorem 19:

**Corollary 20.** vbound<sup>N</sup> (x) <sup>∈</sup> <sup>Ω</sup>(NvExp(x) ) *for all* <sup>x</sup> <sup>∈</sup> *Var and* tbound<sup>N</sup> (t) <sup>∈</sup> Ω(NtExp(t) ) *for all* t ∈ *Trns*(V)*.*

## **7 The Size of the Exponents**

For the remainder of this section, we fix a VASS V for which Algorithm 1 does not report exponential complexity and we fix the computed tree T and bounds vExp, tExp. Additionally, we fix a vector <sup>z</sup><sup>l</sup> <sup>∈</sup> <sup>Z</sup>St(V) for every layer <sup>l</sup> of <sup>T</sup> and a vector <sup>r</sup><sup>η</sup> <sup>∈</sup> <sup>Z</sup>Var for every node <sup>η</sup> <sup>∈</sup> layer(l) as follows: Let r, z be an optimal solution to constraint system (*II*) in iteration l + 1 of Algorithm 1. We then set <sup>z</sup><sup>l</sup> <sup>=</sup> <sup>z</sup>. For every <sup>η</sup> <sup>∈</sup> layer(l) we define <sup>r</sup><sup>η</sup> by setting <sup>r</sup>η(x) = <sup>r</sup>(x, η ), where <sup>η</sup> <sup>∈</sup> layer(<sup>l</sup> <sup>−</sup> vExp(x)) is the unique ancestor of <sup>η</sup> in layer <sup>l</sup> <sup>−</sup> vExp(x). The following properties are immediate from the definition:

**Proposition 21.** *For every layer* <sup>l</sup> *of* <sup>T</sup> *and node* <sup>η</sup> <sup>∈</sup> layer(l) *we have:*


For a vector <sup>r</sup> <sup>∈</sup> <sup>Z</sup>Var , we define the *potential* of <sup>r</sup> by setting pot(r) = max{vExp(x) <sup>|</sup> <sup>x</sup> <sup>∈</sup> *Var* , r(x) = 0}, where we set max <sup>∅</sup> = 0. The motivation for this definition is that we have <sup>r</sup><sup>T</sup> <sup>ν</sup> <sup>∈</sup> <sup>O</sup>(Npot(r) ) for every valuation ν reachable by a trace <sup>ζ</sup> with init(ζ) <sup>≤</sup> <sup>N</sup> . We will now define the *potential* of a set of vectors <sup>Z</sup> <sup>⊆</sup> <sup>Z</sup>Var . Let <sup>M</sup> be a matrix whose columns are the vectors of <sup>Z</sup> and whose rows are ordered according to the variable bounds, i.e., if the row associated to variable x is above the row associated to variable x, then we have vExp(x ) <sup>≥</sup> vExp(x). Let <sup>L</sup> be some lower triangular matrix obtained from <sup>M</sup> by elementary column operations. We now define pot(Z) = - column <sup>r</sup> of <sup>L</sup> pot(r), where we set -<sup>∅</sup> = 0. We note that pot(Z) is well-defined, because the value pot(Z) does not depend on the choice of M and L.

We next state an upper bound on potentials. Let l ≥ 0 and let B<sup>l</sup> = {vExp(x) <sup>|</sup> <sup>x</sup> <sup>∈</sup> *Var* , vExp(x) < l} be the set of variable bounds below <sup>l</sup>. We set varsum(l) = 1, for <sup>B</sup><sup>l</sup> <sup>=</sup> <sup>∅</sup>, and varsum(l) = -Bl, otherwise. The following statement is a direct consequence of the definitions:

**Proposition 22.** *Let* <sup>Z</sup> <sup>⊆</sup> <sup>Z</sup>Var *be a set of vectors such that* <sup>r</sup>(x)=0 *for all* <sup>r</sup> <sup>∈</sup> <sup>Z</sup> *and* <sup>x</sup> <sup>∈</sup> *Var with* vExp(x) > l*. Then, we have* pot(Z) <sup>≤</sup> varsum(<sup>l</sup> + 1)*.*

We define pot(η) = pot({r<sup>η</sup>- | η is a strict ancestor of η}) as the *potential* of a node <sup>η</sup>. We note that pot(η) <sup>≤</sup> varsum(<sup>l</sup> + 1) for every node <sup>η</sup> <sup>∈</sup> layer(l) by Proposition 22. Now, we are able to state the main results of this section:

**Lemma 23.** *Let* <sup>η</sup> *be a node in* <sup>T</sup>*. Then, every trace* <sup>ζ</sup> *with* init(ζ) <sup>≤</sup> <sup>N</sup> *enters* VASS(η) *at most* O(Npot(η) ) *times, i.e.,* ζ *contains at most* O(Npot(η) ) *transitions* s d −→ <sup>s</sup> *with* <sup>s</sup> ∈ *St*(VASS(η)) *and* <sup>s</sup> <sup>∈</sup> *St*(VASS(η))*.*

**Lemma 24.** *For every layer* l*, we have that* vExp(x) = l *resp.* tExp(t) = l *implies* vExp(x) <sup>≤</sup> varsum(l) *resp.* tExp(t) <sup>≤</sup> varsum(l)*.*

The next result follows from Lemma 24 only by arithmetic manipulations and induction on l:

**Lemma 25.** *Let* l *be some layer. Let* k *be the number of variables* x ∈ *Var with* vExp(x) < l*. Then,* varsum(l) <sup>≤</sup> <sup>2</sup><sup>k</sup>*.*

Theorem 11 is then a direct consequence of Lemma 24 and 25 (using k ≤ |*Var* |).

## **8 Exponential Witness**

The following lemma from [15] states a condition that is sufficient for a VASS to have exponential complexity<sup>2</sup>. We will use this lemma to prove Theorem 9:

**Lemma 26 (Lemma 10 of [15]).** *Let* V *be a connected VASS, let* U, W *be a partitioning of Var and let* C1,...,C<sup>m</sup> *be cycles such that a) val*(Ci)(x) ≥ 0 *for all* x ∈ U *and* 1 ≤ i ≤ m*, and b)* - <sup>i</sup> *val*(Ci)(x) ≥ 1 *for all* x ∈ W*. Then, there is a* c > 1 *and paths* π<sup>N</sup> *such that 1)* π<sup>N</sup> *can be executed from initial valuation* <sup>N</sup> · **<sup>1</sup>***, 2)* <sup>π</sup><sup>N</sup> *reaches a valuation* <sup>ν</sup> *with* <sup>ν</sup>(x) <sup>≥</sup> <sup>c</sup><sup>N</sup> *for all* <sup>x</sup> <sup>∈</sup> <sup>W</sup> *and 3)* (Ci)<sup>c</sup><sup>N</sup> *is a sub-path of* π<sup>N</sup> *for each* 1 ≤ i ≤ m*.*

We now outline the proof of Theorem 9: We assume that Algorithm 1 returned "V has at least exponential complexity" in loop iteration l. According to Lemma 18, there are cycles <sup>C</sup>(η), for every node <sup>η</sup> <sup>∈</sup> layer(l), that contain <sup>μ</sup>(t) instances of every transition <sup>t</sup> <sup>∈</sup> *Trns*(VASS(η)). One can then show that the cycles <sup>C</sup>(η) and the sets <sup>U</sup> <sup>=</sup> {<sup>x</sup> <sup>∈</sup> *Var* <sup>|</sup> vExp(x) <sup>≤</sup> <sup>l</sup>}, <sup>W</sup> <sup>=</sup> {<sup>x</sup> <sup>∈</sup> *Var* <sup>|</sup> vExp(x) <sup>&</sup>gt; l} satisfy the requirements of Lemma 26, which establishes Theorem 9.

<sup>2</sup> Our formalization differs from[15], but it is easy to verify that our conditions a) and

b) are equivalent to the conditions on the cycles in the 'iteration schemes' of [15].

## **References**


21. Florian Zuleger. The polynomial complexity of vector addition systems with states. CoRR, abs/1907.01076, 2019.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Author Index

Adámek, Jiří 17 Akshay, S. 37 Alvarez-Picallo, Mario 57

Bérard, Béatrice 97 Bollig, Benedikt 97 Bonchi, Filippo 77 Brunet, Paul 381

Colcombet, Thomas 119

Dal Lago, Ugo 136 Di Giusto, Cinzia 157 Diskin, Zinovy 177

Ehrhard, Thomas 198 Exibard, Léo 217

Fijalkow, Nathanaël 119 Filiot, Emmanuel 217 Finkel, Alain 237 Fiore, Marcelo 277 Fiore, Marcelo P. 257

Gehrke, Mai 299 Genest, Blaise 37 Goncharov, Sergey 542 Grosu, Radu 1 Guerrieri, Giulio 136 Gundersen, Tom 582

Haddad, Serge 237 Heijltjes, Willem 136, 582 Hélouët, Loïc 37 Hoffmann, Jan 359 Huot, Mathieu 319

Jakl, Tomáš 299 Johann, Patricia 339

Kahn, David M. 359 Kappé, Tobias 381 Khmelnitsky, Igor 237 Kupke, Clemens 602 Kura, Satoshi 401 Laird, James 422 Lanese, Ivan 442 Laversa, Laetitia 157 Lehaut, Mathieu 97 Lemay, Jean-Simon Pacaud 57 Löding, Christof 522 Lozes, Etienne 157 Mansutti, Alessio 462 Mehmood, Usama 1 Milius, Stefan 17 Mital, Sharvik 37 Moss, Lawrence S. 17 Neele, Thomas 482 Ohlmann, Pierre 119 Parigot, Michel 582 Péchoux, Romain 562 Perdrix, Simon 562 Phillips, Iain 442 Piedeleu, Robin 77 Pientka, Brigitte 502 Pirogov, Anton 522 Pitts, Andrew M. 257 Polonsky, Andrew 339 Polzer, Miriam 542 Reggio, Luca 299 Rennela, Mathys 562 Reynier, Pierre-Alain 217 Rot, Jurriaan 602 Roy, Shouvik 1 Saville, Philip 277 Schöpp, Ulrich 502 Sherratt, David 582

Silva, Alexandra 381, 602 Smolka, Scott A. 1

Sobociński, Paweł 77 Staton, Sam 319 Steenkamp, S. C. 257 Stoller, Scott D. 1 Sznajder, Nathalie 97

Tiwari, Ashish 1

Ulidowski, Irek 442

Vákár, Matthijs 319 Valmari, Antti 482 van Heerdt, Gerco 602

Wagemaker, Jana 381 Willemse, Tim A. C. 482

Zamdzhiev, Vladimir 562 Zanasi, Fabio 77, 381 Zuleger, Florian 622